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^ (54) Title: A LIBRARY OF A COLLECTION OF CELLS 



(57) Abstract: The present invention relates to combinatorial gene expression libraries and methods for making these. Such libraries 
are useful in discovery of novel and/or enhanced metabolic pathways leading to the production of novel compounds for e.g. drug 
discovery and/or to the production of known compounds in novel quantities or in novel compartments of the cells. The expression 
libraries in particular are composed of host cells capable of co-ordinated and controllable expression of large numbers of heterologous 
genes in the host cells. 
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A library of a collection of cells 

This application is a nonprovisional of U.S. provisional application Serial No. 
60/300,863 filed 27. June 2001, which is hereby incorporated by reference in its 
5 entirety. The application claims priority from Danish patent application number PA 
2001 00128 filed 25. January 2001 and PA 2001 00679 filed on 1. May 2001, which 
are hereby incorporated by reference in their entirety. All patent and nonpatent 
references cited in the application, or In the present application, are also hereby 
incorporated by reference in their entirety. 

10 

The present invention relates to a library of a collection of cells and a method for 
producing said library. The library is useful as a starting rhaterial for evolving cells or 
compositions having new properties. 

15 Technical field 

The present invention relates to combinatorial gene expression libraries and 
methods for making these. Such libraries are useful in discovery of novel and/or 
enhanced metabolic pathways leading to the production of novel compounds for e.g. 
20 drug discovery and/or to the production of known compounds in novel quantities or 
in novel compartments of the cells. 

Background of the invention 

25 Methods are known to provide recombined combinatorial gene expression libraries 
by crossing and recombination between cells comprising expression constructs (WO 
00/52180 Terragen Discovery Ltd). Through the recombination, which may be 
carried out in vitro using the recA recombination enzyme, novel genes are obtained, 
which may or may not be functional in the host cell. 

30 

One drawback of the libraries of the prior art is that evolution of the libraries may 
only be obtained through crossing and recombination between cells whereby 
homologous or homeblogous genes are recombined thereby resulting in novel 
genes yielding gene products with slightly changed properties such as substrate 
35 specificity, solubility, cellular location etc. 
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Furthermore once the expression constructs have been inserted into the cells the 
specific gene combinations of a cell is static. Novel combinations may be obtained 
by crossing and recombination, but this will also lead to formation of novel genes 
5 through cross-over. The novel genes may or may not be functional anymore. 

Furthermore, the expression of the inserted expression construct is a co-expression 
of all the genes inserted into any one cell. When a large number of heterologous 
genes from a wide variety of distantly related species Is assembled in one cell. 
10 chances are great that some of the heterologous genes are lethal or sub-lethal to 
the cell, or that several gene products will compete for the same substrates. When 
only co-expression of the inserts is possible novel metabolic pathways may remain 
undiscovered due to this fact or due to the fact that the novel metabolite was being 
further metabolised to a known metabolite by another inserted enzyme. 

15 

Summary of the invention 

According to a first aspect the invention relates to a library comprising a collection of 
individual cells, the cells being denoted 
20 colli, celb, celli, wherein i > 2, 

each cell comprising at least one concatemer of individual oligonucleotide 
cassettes, each concatemer comprising a nucleotide sequence of the following 
formula: 

[rs2-SP-PR-X-TR.SP-rsi]n 
25 wherein rsi and rs2 together denote a restriction site, SP denotes a spacer of at 

least two bases, X denotes an expressible nucleotide sequence, PR denotes a 
promoter, capable of regulating the expression of X in the cell, TR denotes a 
terminator, and n > 2, and 

wherein at least one concatemer of celh is different from a concatemer of celb. 

30 

The library according to this embodiment of the invention may in any one cell 
comprise a unique and preferably random combination of a high number of 
expression cassettes being heterologous to the host cells. Through this random 
combination of expression cassettes novel and unique combinations of gene 
35 products are obtained in each cell. Such libraries are especially adapted in the 
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discovery of novel metabolic pathways created through the non-native combinations 
of gene products. 

Due to the common structure of the expression cassettes, these may be assembled 
5 easily into concatemers and inserted into the host cells via appropriate vectors. 
Furthermore, the cassettes may at any point be excised from the host cells again 
using a restriction enzyme specific for the rsi-rs2 restriction site preferably without 
excising the host cell's native genes. After excision the expression cassettes may be 
mixed with other expression cassettes of similar structure and be re-concatenated 
10 and re-inserted into another host cell in another combination. 

A further advantage of the common structure of the expression cassettes, is that the 
common rsi-rs2 sequence may be used as a tag for targeted PGR amplification of 
the expression constructs. 

The expressible nucleotidie sequences may conveniently arise from a cDNA libary 
obtained from one or more expression states, wherein the cDNA clones have been 
inserted into expression constructs. Following excision of the expression construct 
from the vector comprising the construct in the cDNA library, the multitude of 
constructs may be concatenated and inserted into a host cell. 

Each unique cell according to the invention may comprise a selection of expressible 
nucleotide sequences from just one expression state and can thus be assembled 
from one library representing this expression state or it may comprise cassettes 
from of a number of different expression states. The variation among and between 
cassettes in the cells may be such as to minimise the chance of cross over as the 
host cell undergoes cell division such as through minimising the level of repeat 
sequences occurring in concatemers, since it is not an object of this embodiment of 
the invention to obtain inter- or intrachromosomal recombination of the concatemers. 
Nor to obtain recombination with epitopes of the host cell. 

The contents of the concatemers may be mixed according to any criteria. Thus a 
library or a sub-library of individual cells may comprise cells having a common 
phenotype. cells comprising expression cassettes from a common source, cells 
35 comprising specific combinations of promoter and expressible nucleotide 
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sequences. A library or sub-library may also or alternatively comprise a collection of 
individual cells comprising one or more common concatemers in addition to differing 
concatemers, wherein the common concatemer may represent expression 
constructs from a common source or coding for genes with a property in common. 

5 

According to another aspect the invention relates to a library comprising a collection 
of individual cells, the cells being denoted 
celh, cell2, ..... celli, wherein i > 2, 

each ceil comprising at least two expression cassettes comprising a nucleotide 

1 0 sequence of the following formula: 

[rs2-SP-PR-X-TR-SP-rsi] 
wherein rsi and rs2 together denote a restriction site, SP denotes a spacer of at 
least two bases, X denotes an expressible nucleotide sequence, PR denotes a 
promoter, capable of regulating the expression of X in cell, TR denotes a 

15 terminator, and 

wherein at least one of the expression cassettes comprises an expressible 
nucleotide sequence heterologous to the to cell, and at least one of the 
cassettes of celh is different from the cassettes of celb. 

20 According to this aspect of the invention, the cells are defined with reference to the 
expression cassettes. This aspect of the invention shares many advantages with the 
first aspect of the invention. 

According to a third aspect the invention relates to a library comprising a collection 
25 of individual cells, the cells being denoted 
celli, cell2, cellj, wherein i > 2, 

each cell comprising a random combination of heterologous oligonucleotides 

having the general formula: 

[PR-X] 

30 wherein X denotes an expressible nucleotide sequence, and PR denotes an 

independently controllable promoter being operably associated with X. 

In a library according to this aspect of the invention, the mixing of gene products 
may not only be done upon insertion of the expressible nucleotide sequences, but 
35 also during expression by inducing and/or repressing one or more promoters each 
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regulating the expression of a random group of expressible nucleotide sequences. 
Thus in each cell, a unique sub-set of genes may be induced and/or repressed at 
any point 

5 This feature adds another level of potential variation in the discovery of novel 
biochemical pathways. By the up and down regulation of independent promoters 
any combination of sub-sets of genes may be turned on or off in a population of cells 
having a random combination of promoters and expressible nucleotide sequences. 

10 In the evolution of novel biochemical pathways based on the insertion and 
expression of a high number of heterologous genes in a population of cells, it is 
highly likely that cells will be killed due to the formation of lethal gene products. If 
each cell comprises just one lethal gene, the co-expression of a number of 
heterologous genes will not lead to any novel biochemical pathways. By having a 

15 random combination of promoters and expressible nucleotide sequences, it may be 
possible to down regulate lethal or sub-lethal genes without affecting the expression 
of the other heterologous expression constructs. 

It is also possible to use the co-ordinated expression obtained through the random 
20 combination of promoters and expressible nucleotide sequences from the same pool 
of expressible nucleotide sequences to identify expressible nucleotide sequences 
involved- in a desired or unwanted property (e.g. lethality or sub-lethality). In a 
population according to this aspect of the invention, each cell may in principle 
comprise more or less the same heterologous expressible nucleotide sequences, 
25 the difference between the cells being the groups of expressible nucleotide 
sequences that are induced/repressed by a given promoter. In such a population of 
cells a desired or unwanted property will be identified in different cells following 
induction/repression of different promoters. As an illustrative example, in cell A the 
property may be associated with induction of promoter i. 2, and 3, and in cell B the 
30 property may be associated with induction of promoter 5 and 6. With this information 
it is possible to target the property (or properties) to the group of expressible 
nucleotide sequences associated with these promoters in these cells. The 
expression constructs may be isolated using knowledge about the promoter 
nucleotide sequence and sequences common for the identified cells may be 
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identified. Thus, by turning on and off only certain sub-sets of genes at a time, it is 
possible to identify which gene combinations have given a particular phenotype. 

According to a further aspect the invention relates to a library comprising at least 
5 one library or at least one sub-library as defined above. In the evolution of novel 
biochemical pathways, it may be preferable to use a number of libraries or sub- 
libraries and to evolve these in parallel or mix the libraries in order to improve the 
chances of identifying a desired property. 

10 According to a further aspect the invention relates to a method of producing a library 
comprising a collection of individual cells, comprising the steps: 

i) providing a population of nucleotide cassettes having the general 
formula [rs2-SP-PR-X-TR-SP-rsi]. wherein rsi and rs2 together denote 
a restriction site, SP denotes a spacer of at least two bases, X 

15 denotes an expressible nucleotide sequence, PR denotes a 

promoter, capable of regulating the expression of X in the cell, TR 
denotes a terminator, and 

ii) assembling random sub-sets of the cassettes into concatemers 
comprising at least two casettes, 

20 iii) ligating the concatemers into vectors, 

iv) introducing vectors into host cells. 

v) mixing at least two cells so that at least one concatemer of a first cell 
comprises a random sub-set of cassettes being different from a 
random sub-set of cassettes of a concatemer of a second cell. 



25 



30 



The assembly of concatemers is facilitated by the common structure of the 
expression cassettes. When the rsi-rs2 restriction site produces sticky ends with a 
predetermined nucleotide sequence the assembly of the concatemers becomes 
especially easy to perform. 

The randomisation of the cassettes may be done at any stage, i.e. during a 
preceding step in which an entry library (for storing and amplifying cassettes) is 
produced or during the insertion into vectors and/or during the transformation into 
host cells. Preferably the randomisation is done during the concatenation step. 



wo 02/059297 PCT/DK02/00056 

7 

According to another aspect the invention relates to a method of producing a library 
comprising a collection of individual cells, comprising the steps: 

i) inserting at least two expressible nucleotides into the cloning site of at 
least two primary vectors comprising a cassette, the cassette comprising 

5 a nucleotide sequence of the general formula in 5'->3' direction: [RS1- 

RS2-SP-PR-CS-TR-SP-RS2-RSr] wherein RS1 and RSV denote first 
restriction sites, RS2 denotes another restriction site different from RS1 
and RS1', SP denotes a spacer sequence of at least two nucleotides, PR 
denotes a promoter, CS denotes a cloning site, and TR denotes a 
10 terminator. 

ii) excising the cassettes using at least a restriction enzyme specific for 
RS1, RSV and RS2 obtaining expression cassettes having the general 
formula [rs2-SP-PR-X-TR-SP-rsi], wherein rsi-rs2 together denote a 
restriction site, and wherein X denotes an expressible nucleotide 

15 sequence, 

iii) inserting the expression cassettes into a vector, 

iv) transferring the expression cassettes into at least two host cells, and 

v) mixing at least two host cells having different cassettes. 

20 According to this method for producing a library of individual cells the source 
expressible nucleotide sequences are first ligated into a primary vector comprising a 
cloning site and a cloning cassette. This primary vector may be maintained in a 
cDNA library and reisolated for excision of the expression cassettes and insertion 
into a host. cell. Through this process the expressible nucleotide sequences are 

25 given a common structure which makes it possible to clone the cassettes into a 
predetermined cloning site in a vector and to remove the cassettes selectively from 
the host cells later. 



According to a final aspect the invention relates to a method of producing a library 
30 comprising a collection of individual cells, comprising the steps: 

i) providing at least one expressible nucleotide sequence, 

ii) ligating at least one expressible nucleotide sequence to a controllable 
promoter capable of functioning in a host cell obtaining a first 
expression construct, 
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iii) ligating at least one expressible nucleotide sequence to another 
independently controllable promoter capable of functioning in a host 
cell, obtaining a second expression construct, 

iv) inserting constructs of step ii) and iii) into at least two host cells, 

5 v) mixing at least two cells having a different combination of 

independently controllable promoter and expressible nucleotide 
sequences. 

According to this aspect of the invention there is provided a convenient method for 
10 preparation of a library of individual cells comprising expressible nucleotide 
sequences under the operable control of at least two controllable promoters. 

Brief description of the drawings 

15 Fig. 1 shows a flow chart of the steps leading from an expression state to 
incorporation of the expressible nucleotide sequences in an entry library (a 
nucleotide library according to the invention). 

Fig. 2 shows a flow chart of the steps leading from an entry library comprising 
20 expressible nucleotide sequences to evolvable artificial chromosomes (EVAC) 
transformed into an appropriate host cell. Fig. 2a shows one way of producing the 
EVACs which includes concatenation, size selection and insertion into an artificial 
chromosome vector. Fig. 2b shows a one step procedure for concatenation and 
ligation of vector arms to obtain EVACs. 



25 



30 



Fig. 3 shows a model entry vector. MCS is a multi cloning site for inserting 
expressible nucleotide sequences. Amp R is the gene for ampicillin resistance. Col 
E is the origin of replication in E. coli. R1 and R2 are restriction enzyme recognition 
sites. 

Fig. 4 shows an example of an entry vector according to the invention, EVE4. 
MET25 is a promoter, ADH1 is a terminator, f1 is an origin of replication for 
filamentous phages, e.g. M13. Spacer 1 and spacer 2 are constituted by a few 
nucleotides deriving from the multiple cloning site, MCS, Srfl and AscI are restriction 
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enzyme recognition sites. Other abbreviations, see Fig. 3. The sequence of the 
vector is set forth in SEQ ID NO 1 . 

Fig 5 shows an example of an entry vector according to the invention, EVES. CUP1 
5 is a promoter, ADH1 is a terminator, f1 is an origin of replication for filamentous 
phages, e.g. M13. Spacer 1 and spacer 2 are constituted by a few nucleotides 
deriving from the multiple cloning site, MCS, Srfl and AscI are restriction enzyme 
recognition sites. Other abbreviations, see Fig. 3. The sequence of the vector is set 
forth in SEQ ID NO 2. 

10 

Fig 6 shows an example of an entry vector according to the invention, EVES. CUP1 
is a promoter, ADH1 is a terminator, f1 is an origin of replication for filamentous 
phages, e.g. M13. Spacer3 is a 550 bp fragment of lambda phage DNA. Spacer4 is 
a ARS1 sequence from yeast. Srfl and AscI are restriction enzyme recognition sites. 
15 Other abbreviations, see Fig. 3. The sequence of the vector is set forth in SEQ ID 
NO 3. 

Fig. 7 shows a vector (pYAC4-Ascl) for providing arms for an evolvable artificial 
chromosome (EVAC) into which a concatemer according to the invention can be 
20 cloned. TRP1, URA3, and HIS3 are yeast auxotrophic marker genes, and AmpR is 
an E. coli antibiotic marker gene. CEN4 is a centromere and TEL are telomeres. 
ARS1 and PMB1 allow replication in yeast and E. coli respectively. BamH I and Asc 
I are restriction enzyme recognition sites. The nucleotide sequence of the vector is 
set forth in SEQ ID NO 4. 

25 

Fig 8. shows the general concatenation strategy. On the left is shown a circular 
entry vector with restriction sites, spacers, promoter, expressible nucleotide 
sequence and terminator. These are excised and ligated randomly. 
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Legend: Lane M: molecular weight marker, X,-phage DNA digested w. Pst1. Lanes 
1-9, concatenation reactions. Ratio of fragments to yac-arms(F/Y) as in table. 

5 Fig 9a and 9b. illustrates the integration of concatenation with synthesis of evolvable 
artificial chromosomes and how concatemer size can be controlled by controlling the 
ratio of vector arms to expression cassettes, as described in example 7. 

Fig 10. Library of EVAC transformed population shown under 4 different growth 
10 conditions. Coloured phenotypes can be readily detected upon induction of the 
Met25 and/or the Capl promoters. 

Fig 11 . EVAC gel Legend: PFGE of EVAC containing clones : 
Lanes, a: Yeast DNA PFGE markers(strain YNN295), b: lambda ladder, c: non- 
15 transformed host yeast, 1-9 : EVAC containing clones, EVACs in size range 1400- 
1600 kb. Lane 2 shows a clone containing 2 EVACs sized -1500 kb and -550 kb 
respectively. The 550kb EVAC is comigrating with the 564kb yeast chromosome 
and is resulting in an increased intensity of the band at 564 kb relative to the other 
bands in the lane. Arrows point up to EVAC bands. 

20 

Definitions 

Unless defined otherwise, all technical and scientific terms used herein have the 
same meaning as is commonly understood by one of skill in the art to which this 
25 invention belongs. 

As used herein, growth under selective conditions, means growth of a cell under 
conditions that require expression of a selectable marker for survival. 

30 By a controllable promoter is meant a promoter, which can be controlled through 
external manipulations such as addition or removal of a compound from the 
surroundings of the cell, change of physical conditions, etc. 

An independently controllable promoter may be induced/repressed substantially 
35 without affecting the induction/repression of other promoters according to the 
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invention. The induction/repression of an independently controllable promoter may 
affect native promoters in the host cells. 

Co-ordinated expression refers to the expression of a sub-set of genes which are 
5 induced or repressed by the same external stimulus. 

Oligonucleotides 

Any fragment of nucleic acids having approximately from 2 to 10000 nucleic acids. 

1 0 Restriction site 

For the purposes of the present invention the abbreviation RSn (n=1,2,3, etc) is 
used to designate a nucleotide sequence comprising a restriction site. A restriction 
site is defined by a recognition sequence and a cleavage site. The cleavage site 
may be located within or outside the recognition sequence. The abbreviation "rsi" or 

15 "rs2" is used to designate the two ends of a restriction site after cleavage. The 
sequence "rsi-rsa" together designate a complete restriction site. 

The cleavage site of a restriction site may leave a double stranded polynucleotide 
sequence with either blunt or sticky ends. Thus. "rsi" or "rs2" may designate either a 
20 blunt or a sticky end. 

In the notation used throughout the present invention, formulae like: 
RSI -RS2-SP-PR-X-TR-SP-RS2-RS1 

should be interpreted to mean that the individual sequences follow in the order 
25 specified. This does not exclude that part of the recognition sequence of e.g. RS2 
overiap with the spacer sequence, but it is a strict requirement that all the items 
except RS1 and RSI' are functional and remain functional after cleavage and re- 
assemblage. Furthermore the formulae do not exclude the possibility of having 
additional sequences inserted between the listed items. For example introns can be 
30 inserted as described in the invention below and further. spacer sequences can be 
inserted between RSI and RS2 and bfetween TR and RS2. Important is that the 
sequences remain functional. 

Furthermore, when reference is made to the size of the restriction site and/or to 
35 specific bases within it, only the bases in the recognition sequence are referred to. 



wo 02/059297 



PCT/DK02/00056 



12 

Expression state 

An expression state is a state in any specific tissue of any individual organism at any 
one time. Any change in conditions leading to changes in gene expression leads to 
5 another expression state. Different expression states are found in different 
individuals, in different species but they may also be found in different organs in the 
same species or individual, and in different tissue types in the same species or 
individual. Different expression states may also be obtained in the same organ or 
tissue in any one species or individual by exposing the tissues or organs to different 
10 environmental conditions comprising but not limited to changes in age, disease, 
infection, drought, humidity, salinity, exposure to xenobiotics, physiological effectors, 
temperature, pressure, pH, light, gaseous environment, chemicals such as toxins. 

Artificial chromosome 

15 As used herein, an artificial chromosome (AC) is a piece of DNA that can stably 
replicate and segregate alongside endogenous chromosomes. For eukaryotes the 
artificial chromosome may also be described as a nucleotide sequence of 
substantial length comprising a functional centromer, functional telomeres, and at 
least one autonomous replicating sequence. It has the capacity to accommodate 

20 and express heterologous genes inserted therein. It is referred to as a mammalian 
artificial chromosome (MAC) when it contains an active mammalian centromere. 
Plant artificial chromosome and insect artificial chromosome (BUGAC) refer to 
chromosomes that include plant and insect centromers, respectively. A human 
artificial chromosome (HAC) refers to a chromosome that includes human 

25 centromeres, AVACs refer to avian artificial chromosomes. A yeast artificial 
chromosome (YAC) refers to chromosomes that are functional in yeast, such as 
chromosomes that include a yeast centromere. 

As used herein, stable maintenance of chromosomes occurs when at least about 
30 85%, preferably 90%. more preferably 95% of the cells retain the chromosome. 
Stability is measured in the presence of a selective agent. Preferably these 
chromosomes are also maintained in the absence of a selective agent. Stable 
chromosomes also retain their structure during cell culturing, suffering neither 
intrachromosomal nor interchromosomal rearrangements. 



35 
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Detailed description of the invention 

The present invention relates to libraries of individual cells useful for capturing and 
preserving a diversity of genetic resources from nature, and for expressing the 
5 captured genetic resources and allowing them to interact to produce a diversity of 
chemical structures. The invention also facilitates screening for desirable properties 
and compounds. 

More particularly, the invention provides methods for constructing and screening 
10 libraries of individual cells comprising heterologous expressible nucleotide 
sequences. These libraries comprise random assortments of expressible nucleotide 
sequences from multiple expression states and preferably also from multiple species 
the products of which are allowed to interact with each other in the expression host, 
and result in some cases in the formation of novel biochemical pathways and/or the 
15 production of novel classes of compounds. Moreover, the libraries of the invention 
provide efficient access to otherwise inaccessible sources of molecular diversity. 

The novel biochemical pathways may carry out processes including but not limited 
to structural modification of a compound, addition of chemical groups to the 
20 compound, or decomposition of the compound. 

The novel classes of compound may include but are not limited to metabolites, 
secondary metabolites, enzymes, or structural components of an organism. A 
compound of interest may have one or more potential therapeutic properties. 
25 including but not limited to agonist or antagonist to a class of receptor or a particular 
receptor, antibiotic, antiviral, antitumor, pharmacological or immunomodulating 
properties or be other commercially-valuable chemicals such as pigments. 

A library of individual cells is a library comprising expression constructs prepared 
30 from randomly assembled or even concatenated expressible nucleotide sequences 
derived from a plurality of species of donor organisms, in which expressible 
nucleotide sequences are operably associated with regulatory regions that drives 
expression of the expressible nucleotide sequences in an appropriate host 
organism. The host organisms used are capable of producing functional gene 
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products of the donor organisms. Upon expression in the host organism, gene 
products of the donor organism(s) may interact to form novel biochemical pathways. 

Generally, the methods of the invention comprise providing expressible nucleotide 
5 sequences derived from one or more donor organism(s), engineering said 
expressible nucleotide sequences into a context where said expressible nucleotide 
sequences can be transcribed in a given host organism, and introducing said 
expressible nucleotide sequences into a host organism via a cloning or expression 
vector so that one or more expressible nucleotide sequences of the donor 
10 organism(s) are transferred to and expressed in the host organism. Such host 
organisms containing donor expressible nucleotide sequences are pooled to form a 
library. 

The transferred genetic material, typically comprises a random assortment of 
15 expressible nucleotide sequences, the expression of which is driven and controlled 
by one or preferably by more functional regulatory regions. The expression construct 
or vector advantageously provide these regulatory regions. The expressible 
nucleotide sequences of the donor organism(s) are transcribed, translated and 
processed in the host organism to produce functional proteins that in turn generate 
20 the metabolites of interest. 

Once a desirable activity or compound is identified, downstream drug development 
efforts such as strain improvement and process development, are greatly facilitated. 
The positive clone can be cultured under standard conditions to produce the desired 
25 compound in substantial amounts for further studies or uses. The expressible 
nucleotide sequences of the biochemical pathway are immediately available for 
sequencing, mutation, expression, and further rounds of screening. The cloned 
biochemical pathway is readily amenable to traditional and/or genetic manipulations 
for overproduction of the desired compound. 

30 

Furthermore, according to the embodiments comprising the expression cassettes 
with common structure, several positive cells may be identified, their expression 
cassettes be excised due to the presence of a common restriction site, which is 
preferably a rare restriction site. The excised expression cassettes may be re- 
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assembled in a random or targeted manner to produce novel combinations of the 
selected expression cassettes. 

Furthermore, biochemical pathways that are otherwise silent or undetectable in the 
5 donor organism may be discovered more easily by virtue of their functional 
reconstitution in the host organism. Since the biochemical characteristics of the host 
organism are well known, many deviations as a result of expression of donor genetic 
material can readily be recognised. Novel compounds may be detected by 
comparing extracts of a host organism containing donor genetic material against a 
10 profile of compounds known to be produced by the control host organism under a 
given set of environmental conditions. Even very low levels of a desirable activity or 
compound may be detected when the host biochemical and cellular background of 
the host organism is well characterised. 

15 In one embodiment, the methods may be applied to donor organism(s) that cannot 
be recovered in substantial amounts in nature, or cultured in the laboratory. By 
transferring genetic material such as cDNA from such organisms into a host 
organism, the organisms' metabolic pathways may be reproduced, and their 
products tested efficiently for any desirable properties. Thus, the genetic diversity of 

20 these organisms is captured and preserved and combined with the genetic diversity 
of other organisms. 

In another embodiment of the invention, a library can be constructed in which the 
expressible nucleotide sequences from one or multiple donor organisms are 

25 randomly concatenated prior to introduction into the host organism. Thus, each host 
organism in the library may individually contain a unique, random combination of 
expressible nucleotide sequences derived from the various donor pathways or 
organisms. For the most part, such combinations of expressible nucleotide 
sequences in the library do not occur in nature. Upon expression, the functional 

30 gene products of the various donor pathways or organisms interact with each other 
and with the native host complement of gene products in individual host organisms 
to generate combinations of biochemical reactions which result in novel metabolic 
pathways and/or production of novel compounds. Collectively, the genetic resources 
of the donor organisms in the library are translated into a diversity of chemical 

35 compounds that may not be found in individual donor organisms. 
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In another aspect of the invention, the methods may be applied to the generation of 
a multiple kingdom pathway in the host organism. An example of this would be the 
introduction of genes from carotenoid pathways (obtained from fungi, algae and/or 
5 plants) as well as genes from synthesis of Vitamin A (obtained from animals) or 
genes coding for the production of visual pigments (obtained from insects). By such 
targeted selection and combination of elements of biochemical pathways across 
kingdoms the likelihood of obtaining novel metabolites may be further increased. 

10 In another aspect of the invention, the species of donor organisms may be selected 
on the basis of their biological characteristics. Such biological characteristics may 
include, but are not limited to the capability to utilise certain nutrients, to survive 
under extreme conditions, to derivatise a chemical structure, and the ability to break 
down or catalyse formation of certain types of chemical linkages. When expressible 

15 nucleotide sequences of the donor organism are expressed in the host organism, 
the donor gene products can modify and/or substitute the functions of host gene 
products that constitute host metabolic pathways, thereby generating novel hybrid 
pathways. Novel activities and/or compounds may be produced by hybrid pathways 
comprising donor and host-derived components. The target metabolic pathway 

20 modified by donor gene products may be native to the host organism. Alternatively, 
the target metabolic pathway may be provided by products of heterologous genes 
which are endogenous or have been genetically engineered into every host 
organism prior to or contemporaneous to construction of the gene expression 
library. Thus, the present invention also embodies constructing and screening gene 

25 expression libraries, wherein DNA fragments encoding metabolic pathway of donor 
organisms are cloned and coexpressed in host organisms containing a target 
metabolic pathway. 

In another embodiment of the invention, the host organism may have an enhanced 
30 complement of active drug efflux systems which secretes the compounds of interest 
into the culture medium, thus reducing the toxicity of the compounds to the host 
organism. Absorptive material, e.g., neutral resins, may be used during culturing of 
the host organisms, whereby metabolites produced and secreted by the host 
organism may be sequestered, thus facilitating recovery of the metabolites. 
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In many respects, the libraries provides significant convenience and time advantage 
to the various steps of development of novel small molecules such as the 
development of drugs up to clinical trials. The libraries of the Invention are 
5 compatible with e.g. the established multi-well footprint fonnat and robotics for high- 
throughput screening. The host organisms of the invention are organisms commonly 
used for genetic mariipulation and/or process development. The present invention 
takes advantage of the fact that such host organisms or production hosts are well- 
characterised in terms of their biological properties and maintenance requirements. 
10 By transferring genetic materials from a donor organism to other more familiar 
expression systems, the need for difficult culturing conditions for the donor organism 
is reduced. Thus, the biological activities, the pharmacokinetic and toxic properties 
of any lead compound discovered in the system of the invention may be studied and 
optimised more efficiently. 

15 

The novel metabolic pathway generated in a positive clone can be delineated by 
standard techniques in molecular biology. The lead compound may be synthesised 
by culturing a clone of the drug-producing host organism under standard or 
empirically determined culture conditions, so that sufficient quantities of the lead 

20 compound may be isolated for further analysis and development. There are already 
high purity manufacturing protocols, such as Good Manufacturing Practice (GMP) 
established for some of these standard, industrial host organisms. Unlike 
conventional methods of screening natural product sources, less effort is required to 
adapt the screening and production technologies to the particular requirements of 

25 each potentially drug-producing organism. 

The present invention also provides libraries made according to the methods of the 
invention from genetic materials of a particular set of donor organisms and/or cell 
types. Not all organisms or cell types in a set, especially mixed samples, need to be 
30 individually identified or characterised to enable preparation of the libraries. 

Any library of the invention may be amplified, replicated, and stored. Amplification is 
preferably performed by introducing entry vectors containing expressible nucleotide 
sequences in a initial host organism such as E. coli so that so that multiple clones of 
35 the expressible nucleotide sequences are produced. Replication refers to picking 
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and growing of individual clones in the library. A library of the invention may be 
stored and retrieved by any techniques known in the art that is appropriate for the 
host organism. Thus, the libraries of the invention are an effective means of 
capturing and preserving the genetic resources of donor organisms, which may be 
5 accessed repeatedly in a drug discovery program or other discovery programs. 

Concatemer Assemblage 

Concatemers may be assembled from cDNA libraries on a routine basis. A typical 
10 concatemer generation step will pool e.g. 1,000 genes = cDNA expression 
constructs (from 1 sample) and use this to generate 1,000 concatemers, with an 
average of 25 genes per concatemer. This means on average each gene will be in 
25 different concatemers within a pool. One such concatemer "Source Pool" may be 
generated per source cDNA library. The Source Pools are suitable for storage of the 
1 5 concatemers. 

However, the invention is not limited to any specific number of genes in a source 
pool. Concatemers with approximately 500 genes are easily produced and it is 
contemplated that this number can be increased even further 

20 

The actual numbers depend on the number of different promoters and/or spacers 
and/or terminators to be incorporated -i.e. if an expression state gives 1000 different 
cDNAs and these are to be combined with 2 promoters and/or spacers and/or 
terminators the numbers increase proportionally: 1000 cDNAs = 2000 expression 
25 constructs, so if each construct should still be present in 25 concatemers of 25 
constructs then the source pool size would be 2000. 

Certain Source Pools may in fact be generated on a function rather than species 
basis. Such a source pool may for example be based on sources known for a 
30 specific property, such as carotenoid activity, pharmaceutical properties, 
chemotaxonomic properties, etc. 



35 
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Host library assemblage 

Source Pools may be mixed and used to generate host libraries or screening 
libraries witli each host containing multiple concatemers In selecting which Source 
5 Pools to mix one may use knowledge of the source of given libraries, host pathways, 
the desired focus of particular programmes and success rates of given libraries in 
particular screens. 

If each source library is constructed from 1 ,000 different genes and asssembled into 
10 EVACs each containing 25 genes, then for any one given gene, of those EVACS 

that do contain the gene, 98.8% of them will contain just one copy, 1 .2% will contain 
2 copies and 0.01% will contain 3 or more copies. Thus for all practical purposes 
each EVAC can in this situation be regarded as composed of 25 different genes. 
Should a cell population be created from four such source libraries, then each differ- 
1 5 ent gene (assuming no overlap between genes from different sources) will be repre- 
sented at a frequency of 1 copy per 4,000 genes. 

In a cell population where each cell contains four EVACS. generated from a pool of 
4 source libraries, then in respect of any one of the source libraries, statistically: 
20 • 0.4% of cells will have all four of their EVACS from this source 

• 4.3% of cells will have three out of four EVACS from this source 

• 25.5% of cells will have two EVACS from this source 

• 38.3% of cells will have just one EVAC from this source 

• 31 .6% of cells will not contain any EVACs from this source 

25 From these figures the probability of any two-gene combination can be calculated 
using standard statistical tools 

For more focused evolutionary approaches, such as the evolution of novel carote- 
noids or other known structural classes or metabolite pathways, EVACs can be en- 

30 riched for enzymes, and homologs or functional analogs of these enzymes, that 
conduct different stages of the. metabolic pathway. Such an approach can lead to 
significant probabilities that essentially all steps of a given pathway are represented, 
at least at the transcription level, in a cell. Thus if a 10-step pathway is required, 
and 50-gene EVACS are constructed randomly from genes encoding for homolo- 

35 gous or analogous enzymes to those responsible for each step then any given step 
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will be encoded in >85% of EVACs between 3-9 times (inclusive) and will be entirely 
missing in just 0.52% of EVACs. Thus it can be seen that a 10® member cell popu- 
lation where each cell contains 4 EVACs of 50 genes each, constructed from 4 en- 
zyme encoding gene pools, will contain a large number of cells in which all steps of 
5 the potential pathway are represented, in most cases multiple times. 

Sub-libraries 

Initial screens are designed to sort host lines into "collections", sub-libraries, based 
10 on whether novel activity has been induced by the concatemers, and the type of 
activity that has been induced. As such initial screens should be reasonably high 
throughput and should be arbitrary in their selection criteria. 

A large number of such screens can be considered. An illustrative example of such 
1 5 screens may include but are not limited to: 

• Novel spectral properties 

• Induced cytochrome oxidase activity 

• Changed size, morphology, stickiness or adhesive properties or lack thereof 

• Ability to grow on substrates they cannot normally grow on 
20 • Ability to grow on sublethal substrates 

• Ability to grow in the absence of normal essential requirements 

• Ability to grow on media comprising one or more inhibitors 

• Ability to grow under changed physical conditions, such as temperature, 
osmolarity, electromagnetic radiation including light of certain wavelengths. 

25 • Ability to grow under magnetic field of certain force. 

• Secretion or the lack of it from the cell 

• The inhibition or prevention of inhibition of an enzyme 

• The activation of a receptor. 

• The prevention of an activating molecule binding to a receptor. 

30 • The inhibition or promotion of binding of small molecules or proteins to 

nucleic acid or peptide sequences. 

• The inhibition or promotion of transcription or translation of post translational 
processing. 

• Changes in the transport or localisation of molecules within the cell or within 
35 organelles. 
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• Changes in the DNA content or morphology of the cell. 

• The production of small molecules with certain properties that allow their 
selective isolation (e.g. all the chromoatography principles available to the 
skilled practitioner). 

5 • The production of small molecules with certain spectroscopic properties 

(defined broadly to include visible light, microwaves, IR, UV, X-ray, etc.). 

• Changes in the morphology of the cell, including the prevention or promotion 
of cell differentiation. 

• The induction of apoptotic pathways. 

For each Host Library (of 10,000 host lines) the 1-2 % of host lines that are most 
extreme on each of such criteria may be grouped into a sub-library. These initial 
sorting screens will in general be conducted under conditions that maximise the 
number of genes expressed per concatemer. 

The output of a sorting screen may be host lines that are characterised on one or 
more broad criteria. These may be categorised as sub-libraries. 

A sub-library may be defined with reference to a common phenotype of the cells in 
20 the sub-library. But a sublibrary may also be defined as a collection of individual 
cells, said cells having - for at least one identical expressible DNA sequence - 
different promoters, i.e. with reference to the presence of specific expressible 
nucleotide sequences. Furthermore, a sub-library may be described with reference 
to a cassette and/or in a concatemer of cassettes comprised in the host cells. A sub- 
25 library may thus be defined as a collection of individual cells, each cell having - in at 
least one cassette of the concatemer - identical expressible DNA sequences. A 
sublibrary may also be looked upon as a collection of individual cells, said cells 
having - for at least one identical expressible DNA sequence, more preferably for 
substantially all identical expressible nucleotide sequences - different promoters. 

30 

The common phenotype of a given sub-library may be at least one phenotype 
selected from the group comprising the ability to grow on unusual substrates, the 
ability to grow on sublethal concentration of toxins, the ability to grow at a high 
temperature, the ability to grow at a low temperature, the ability to grow at elevated 
35 osmolality, the ability to grow at low osmolality, the ability to grow at high salinity, the 
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ability to grow at low salinity, the ability to grow at elevated metal concentrations, the 
ability to grow at high CO2 concentrations, the ability to grow at low CO2 
concentrations, the ability to grow at high O2 concentrations, the ability to grow at 
low O2 concentrations, the ability to provide special spectral properties, the ability to 
5 provide a special colour, the ability to have a deviating GST activity, the ability to 
have a deviating P450 activity. 

Size of library 

10 A library of cells may in principle comprise just two cells differing with respect to one 
of the features discussed below. However, normally a library comprises at least 20 
individual cells, such as at least 50 individual cells. More preferably, a library 
comprises at least 100 individual cells, such as at least 1,000 cells, for example at 
least 10,000 cells such as at least 100,000 cells, for example at least 1,000,000 

1 5 cells, such as at least 1 ,000,000,000 cells. 

The number of cells in a sub-library depends on the selection criterion or criteria 
used. At the beginning a sub-library typically comprises less cells than a library, but 
the cells of the sub-library may be combined or allowed to sexually propagate to 
20 produce increased variation and in this way the number of different cells in a sub- 
library may increase. 

Variation among cells 

25 The difference between cells in a library may be defined with reference to 
differences between expression cassettes, between concatemers or differences 
between promoters controlling the expression of an expressible nucleotide 
sequence. 

30 Thus in a library according to the invention a concatehner of each cell may comprise 
at least a first cassette and a second cassette, said first cassette being different 
from said second cassette. More preferably substantially all cassettes of a 
concatemer in a given cell are different. 
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The difference between the expression cassettes, which may be reflected in the 
difference between concatemers in different cells may be a difference in. the spacer 
sequences and/or the promoter, and/or the expressible nucleotide sequence and/or 
the intron and/or terminator sequence. 

5 

When the differences lie in the expressible nucleotide sequences these different 
expressible nucleotide sequences may come from the same or from different 
expression states. The different expression states may represent at least two 
different tissues, such as at least two organs, such as at least two species, such as 
10 at least two genera. The different species are from at least two different phylae, 
such as from at least two different classes, such as from at least two different 
divisions, more preferably from at least two different sub-kingdoms, such as from at 
least two different kingdoms. In this way cells and libraries representing an 
extremely wide array of gene combinations is obtained. 

15 

Preferably substantially all cells in a library are different. This increases the number 
of available combinations of expressible nucleotide sequences. Further variation 
may be obtained by having one library in cells of one mating type and another library 
in cells of another mating type. For yeast this may be obtained by having one library 
20 in Mata cells and another library in Mata cells. These may then be sexually crossed 
to obtain further variation. 

According to an especially preferred embodiment of the invention the library 
comprises a random combination of promoter and expressible nucleotide sequences 
25 made from a two dimensional array of promoters and heterologous expressible 
nucleotide sequences. Thereby, it is possible to get - in principle - all expressible 
nucleotide sequences from a given pool represented in a library under the control of 
different promoters. 

30 When each cell furthermore comprises an individual selection of combinations of 
promoters and heterologous expressible nucleotide sequences drawn individually 
from the same pool of promoters and heterologous expressible nucleotide 
sequences completely random combinations of promoter and expressible nucleotide 
sequences are inserted into all cells. Each expressible nucleotide sequence may 
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then be found in the library under the control of different promoters and in a number 
of combinations with a number of other expressible nucleotide sequences. 

Each library may comprise at least 2 different independently controllable promoters, 
5 such as at least 3, for example at least 4. such as at least 5, for example at least 6, 
such as at least 7, for example at least 8, such as at least 9, for example at least 10, 
such as at least 1 5. for example at least 25, such as at least 50, for example at least 
75, such as at least 100. The higher the number of promoters in the library, the 
number of sub-sets of genes may be constructed within any one cell and within any 
10 one library. Preferably the regulation of the promoters should not interact on each 
other. The absence of interaction sets an upper limit to the number of promoters that 
can be used under practical circumstances. However, new promoters are 
discovered and synthetic promoters are being developed continuously so it is likely 
that in the future combinations of different non-interacting promoters can be made. 

15 

At least one heterologous expressible nucleotide sequence may be found in at least 
2 cells, such as at least 3 cells, for example at least 5 cells, such as at least 10 cells, 
for example at least 25 cell, such as at least 50 cells, for example at least 100 cells, 
such as at least 500 cells, for example at least 1000 cells. By having the same 
20 expressible nucleotide represented in several preferably in many cells, any one 
expressible nucleotide sequence may be found in many combinations with different 
expressible nucleotide sequences. 

The combination of promoter and expressible nucleotide sequences in any one cell 
25 may be laid out so that at least one cell comprises a group of heterologous 
expressible nucleotide sequences under the control of a first promoter, the group 
comprising at least 5 heterologous expressible nucleotide sequences, such as at 
least 10 heterologous expressible nucleotide sequences, for example at least 15 
heterologous expressible nucleotide sequences, such as at least 25 heterologous 
30 expressible nucleotide sequences, for example at least 50 heterologous expressible 
nucleotide sequences, such as at least 75 heterologous expressible nucleotide 
sequences, for example at least 100 heterologous expressible nucleotide 
sequences, such as at least 250 heterologous expressible nucleotide sequences, for 
example at least 500 heterologous expressible nucleotide sequences. Thereby a 
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sub-set of expressible nucleotide sequences of different size can be turned on and 
off in the cells. 

By furthermore having in a cell at least a second group of heterologous expressible 
5 nucleotide sequences under the independent control of second promoter, such as at 
least a third group of heterologous expressible nucleotide sequences under the 
independent control of a third promoter, for example at least a fourth group of 
heterologous expressible nucleotide sequences under the independent control of a 
fourth promoter, such as at least a fifth group of heterologous expressible nucleotide 

10 sequences under the independent control of a fifth promoter, for example at least a 
sixth group of heterologous expressible nucleotide sequences under the 
independent control of a sixth promoter, such as at least a seventh group of 
heterologous expressible nucleotide sequences under the independent control of a 
seventh promoter, such as at least a eighth group of heterologous expressible 

15 nucleotide sequences under the independent control of a eighth promoter, for 
example at least a ninth group of heterologous expressible nucleotide sequences 
under the independent control of a ninth promoter, such as at least a tenth group of 
heterologous expressible nucleotide sequences under the independent control of a 
tenth promoter, groups of expressible nucleotide sequences, sub-sets, may be 

20 turned on and off in the cells. 

Origin of expressibie nucleotide sequences 

The expressible nucleotide sequences that can be inserted into the vectors, 
concatemers, and cells according to this invention encompass any type of 

25 nucleotide such as RNA, DNA. Such a nucleotide sequence could be obtained e.g. 
from cDNA, which by its nature is expressible. But it is also possible to use 
sequences of genomic DNA, coding for specific genes. Preferably, the expressible 
nucleotide sequences correspond to full length genes such as substantially full 
length cDNA, but nucleotide sequences coding for shorter peptides than the original 

30 full length mRNAs may also be used. Shorter peptides may still retain the catalytic 
activity similar to that of the native proteins. 

Another way to obtain expressible nucleotide sequences is through chemical 
synthesis of nucleotide sequences coding for l<nown peptide or protein sequences. 
35 Thus the expressible DNA sequences does not have to be a naturally occurring 
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sequence, although it may be preferable for practical purposes to primarily use 
naturally occurring nucleotide sequences. Whether the DNA is single or double 
stranded will depend on the vector system used. 

5 In most cases the orientation with respect to the promoter of an expressible 
nucleotide sequence will be such that the coding strand is transcribed into a proper 
mRNA. It is however conceivable that the sequence may be reversed generating an 
antisense transcript in order to block expression of a specific gene. 

1 0 Cassettes 

An important aspect of the invention concerns a cassette of nucleotides in a highly 
ordered sequence, the cassette having the general formula in 5'^3' direction: 
[RS1 -RS2.SP-PR-CS-TR-SP-RS2 -RS1 '] 
15 wherein RS1 and RS1' denote restriction sites, RS2 and RS2' denote restriction 
sites different from RS1 and RSI', SP individually denotes a spacer sequence of at 
least two nucleotides, PR denotes a promoter, CS denotes a cloning site, and TR 
denotes a terminator. 

20 It is an advantage to have two different restriction sites flanking both sides of the 
expression construct. By treating the primary vectors with restriction enzymes 
cleaving both restriction sites, the expression construct and the primary vector will 
be left with two non-compatible ends. This facilitates a concatenation process, since 
the empty vectors do not participate in the concatenation of expression constructs. 

25 

Restriction sites 

In principle, any restriction site, for which a restriction enzyme is known can be 
used. These include the restriction enzymes generally known and used in the field of 
30 molecular biology such as those described in Sambrook, Fritsch, Maniatis, "A 
laboratory Manual", 2"^ edition. Cold Spring Harbor Laboratory Press, 1989. 

The restriction site recognition sequences preferably are of a substantial length, so 
that the likelihood of occurrence of an identical restriction site within the cloned 
35 oligonucleotide is minimised. Thus the first restriction site may comprise at least 6 
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bases, but more preferably the recognition sequence comprises at least 7 or 8 
bases. Restriction sites having 7 or more non N bases in the recognition sequence 
are generally known as "rare restriction sites" (see example 6). However, the 
recognition sequence may also be at least 10 bases, such as at least 15 bases, for 
5 example at least 16 bases, such as at least 17 bases, for example at least 18 bases, 
such as at least 18 bases, for example at least 19 bases, for example at least 20 
bases, such as at least 21 bases, for example at least 22 bases, such as at least 23 
bases, for example at least 25 bases, such as at least 30 bases, for example at 
least 35 bases, such as at least 40 bases, for example at least 45 bases, such as at 
10 least 50 bases. 

Preferably the first restriction site RSI and RSV is recognised by a restriction 
enzyme generating blunt ends of the double stranded nucleotide sequences. By 
generating blunt ends at this site, the risk that the vector participates in a 
15 subsequent concatenation is greatly reduced. The first restriction site may also give 
rise to sticky ends, but these are then preferably non-compatible with the sticky ends 
resulting from the second restriction site, RS2 and RS2' and with the sticky ends in 
the AC. 

20 According to a preferred embodiment of the invention, the second restriction site, 
RS2 and RS2' comprises a rare restriction site. Thus, the longer the recognition 
sequence of the rare restriction site the more rare it is and the less likely is it that the 
restriction enzyme recognising it will cleave the nucleotide sequence at other - 
undesired - positions. 

25 

The rare restriction site may furthermore serve as a PGR priming site. Thereby it is 
possible to copy the cassettes via PGR techniques and thus indirectly "excise" the 
cassettes from a vector. 

30 Spacer sequence 

The spacer sequence located between the RS2 and the PR sequence is preferably 
a non-transcribed spacer sequence. The purpose of the spacer sequence(s) is to 
minimise recombination between different concatemers present in the same cell or 
35 between cassettes present in the same concatemer, but it may also serve the pur- 
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pose of making the nucleotide sequences in the cassettes more "host" like. A further 
purpose of the spacer sequence is to reduce the occurrence of hairpin formation 
between adjacent palindromic sequences, which may occur when cassettes are 
assembled head to head or tail to tail. Spacer sequences may also be convenient 
5 for introducing short conserved nucleotide sequences that may serve e.g. as PGR 
primer sites or as target for hybridization to e.g. nucleic acid or PNA or LNA probes 
allowing affinity purification of cassettes. 

The cassette may also optionally comprise another spacer sequence of at least two 
nucleotides between TR and RS2. When cassettes are cut out from a vector and 

10 concatenated into concatemers of cassettes, the spacer sequences together ensure 
that there is a certain distance between two successive identical promoter and/or 
terminator sequences. This distance may comprise at least 50 bases, such as at 
least 60 bases, for example at least 75 bases, such as at least 100 bases, for 
example at least 150 bases, such as at least 200 bases, for example at least 250 

15 bases, such as at least 300 bases, for example at least 400 bases, for example at 
least 500 bases, such as at least 750 bases, for example at least 1000 bases, such 
as at least 1100 bases, for example at least 1200 bases, such as at least 1300 
bases, for example at least 1400 bases, such as at least 1500 bases, for example at 
least 1600 bases, such as at least 1700 bases, for example at least 1800 bases, 

20 such as at least 1900 bases, for example at least 2000 bases, such as at least 2100 
bases, for example at least 2200 bases, such as at least 2300 bases, for example at 
least 2400 bases, such as at least .2500 bases, for example at least 2600 bases, 
such as at least 2700 bases, for example at least 2800 bases, such as at least 2900 
bases, for example at least 3000 bases, such as at least 3200 bases, for example at 

25 least 3500 bases, such as at least 3800 bases, for example at least 4000 bases, 
such as at least 4500 bases, for example at least 5000 bases, such as at least 6000 
bases. 

The number of the nucleotides between the spacer located 5' to the PR sequence 
30 and the one located 3' to the TR sequence may be any. However, it may be 
advantageous to ensure that at least one of the spacer sequences comprises 
between 100 and 2500 bases, preferably between 200 and 2300 bases, more 
preferably between 300 and 2100 bases, such as between 400 and 1900 bases, 
more preferably between 500 and 1700 bases, such as between 600 and 1500 
35 bases, more preferably between 700 and 1400 bases. 
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If the intended host cell is yeast, the spacers present In a concatemer should 
perferably comprise a combination of a few ARSes with varying lambda phage DNA 
fragments. 

5 

Preferred examples of spacer sequences include but are not limited to: Lamda 
phage DNA, prokaryotic genomic DNA such as E. coll genomic DNA, ARSes. 

Promoter 

10 A promoter is a DNA sequence to which RNA polymerase binds and initiates 
transcription. The promoter determines the polarity of the transcript by specifying 
which strand will be transcribed. 

• Bacterial promoters normally consist of -35 and -10 (relative to the 
transcriptional start) consensus sequences which are bound by a specific 

1 5 Sigma factor and RNA polymerase. 

• Eulcaryotic promoters are more complex. Most promoters utilized in 
expression vectors are transcribed by RNA polymerase II. General 
transcription factors (GTFs) first bind specific sequences near the 
transcriptional start and then recruit the binding of RNA polymerase II. In 

20 addition to these minimal promoter elements, small sequence elements are 

recognized specifically by modular DNA-binding / trans-activating proteins 
(e.g. AP-1, SP-1) which regulate the activity of a given promoter. 

• Viral promoters may serve the same function as bacterial and eukaryotic 
promoters. Upon viral infection of their host, viral promoters direct 

25 transcription either by using host transcriptional machinery or by supplying 

virally encoded enzymes to substitute part of the host machinery. Viral 
promoters are recognised by the transcriptional machinery of a large number 
of host organisms and are therefore often used in cloning and expression 
vectors. 

30 Promoters may furthermore comprise regulatory elements, which are DNA 
sequence elements which act in conjunction with promoters and bind either 
repressors (e.g., lacO/ LAC Iq repressor system in E. coli) or inducers (e.g., gall 
/GAL4 inducer system in yeast). In either case, transcription is virtually "shut off' 
until the promoter is derepressed or induced, at which point transcription is "turned- 
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on". The choice of promoter in the cassette is primarily dependent on the host 
organism into which the cassette is intended to be inserted. An important 
requirement to this end is that the promoter should preferably be capable of 
functioning in the host cell, in which the expressible nucleotide sequence is to be 
5 expressed. 

Preferably the promoter is an externally controllable promoter, such as an inducible 
promoter and/or a repressible promoter. The promoter may be either controllable 
(repressible/inducible) by chemicals such as the absence/presence of chemical 
10 inducers, e.g. metabolites, substrates, metals, hormones, sugars. The promoter may 
likewise be controllable by certain physical parameters such as temperature, pH, 
redox status, growth stage, developmental stage, or the promoter may be 
inducible/repressible by a synthetic inducer/repressor such as the gal inducer. 

15 In order to avoid unintentional interference with the gene regulation systems of the 
host cell, and in order to improve controllability of the co-ordinated gene expression 
the promoter is preferably a synthetic promoter. Suitable promoters are described in 
US 5,798,227, US 5,667,986. Principles for designing suitable synthetic eukaryotic 
promoters are disclosed in US 5.559,027, US 5,877,018 or US 6,072,050. 

20 

Synthetic inducible eukaryotic promoters for the regulation of transcription of a gene 
may achieve improved levels of protein expression and lower basal levels of gene 
expression. Such promoters preferably contain at least two different classes of 
regulatory elements, usually by modification of a native promoter containing one of 

25 . the inducible elements by inserting the other of the inducible elements. For example, 
additional metal responsive elements IR:Es) and/or glucocorticoid responsive 
elements (GREs) may be provided to native promoters. Additionally, one or more 
constitutive elements may be functionally disabled to provide the lower basal levels 
of gene expression. 

30 ' 

Preferred examples of promoters include but is not limited to those promoters being 
induced and/or repressed by any factor selected from the group comprising 
carbohydrates, e.g. galactose; low inorganic phosphase levels; temperature, e.g. 
low or high temperature shift; metals or metal ions, e.g. copper ions; hormones, e.g. 
35 dihydrotestosterone; deoxycorticosterone; heat shock (e.g. 39°C); methanol;, redox- 
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Status; growth stage, e.g. developmental stage; synthetic inducers, e.g. gal inducer. 
Examples of such promoters include ADH 1. PGK 1, GAP 491, TPI. PYK. ENO, 
PMA 1, PH05, GAL 1, GAL 2, GAL 10. MET25. ADH2, MEL 1. CUP 1. HSE, AOX, 
MOX, SV40. CaMV, Opaque-2, GRE, ARE, PGK/ARE hybrid, CYC/GRE hybrid, 
5 TPI/a2 operator. AOX 1 , MOX A. 

More preferably, however the promoter is selected from hybrid promoters such as 
PGK/ARE hybrid, CYC/GRE hybrid or from synthetic promoters. Such promoters 
can be controlled without interfering too much with the regulation of native genes in 
10 the expression host. 

Yeast promoters 

In the following, examples of known yeast promoters that may be used in 
conjunction with the present invention are shown. The examples are by no way 
15 limiting and only serve to indicate to the skilled practitioner how to select or design 
. promoters that are useful according to the present invention. 

Although numerous transcriptional promoters which are functional in yeasts have 
been described in the literature, only some of them have proved effective for the 

20 production of polypeptides by the recombinant route. There may be mentioned in 
particular the promoters of the PGK genes (3-phosphoglycerate kinase, TDH genes 
encoding GAPDH (Glyceraldehyde phosphate dehydrogenase), TEF1 genes 
(Elongation factor 1), MFal (a sex pheromone precursor) which are considered as 
strong constitutive promoters or alternatively the regulatable promoter CYCI which is 

25 repressed in the presence of glucose or PH05 which can be regulated by thiamine. 
However, for reasons which are often unexplained, they do not always allow the 
effective expression of the genes which they control. In this context, it is always 
advantageous to be able to have new promoters in order to generate new effective 
host/vector systems. Furthermore, having a choice of effective promoters in a given 

30 cell also makes it possible to envisage the production of multiple proteins in this 
same cell (for example several enzymes of the same metabolic chain) while 
avoiding the problems of recombination between homologous sequences. 
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In general, a promoter region is situated in the 5' region of the genes and comprises 
all the elements allowing the. transcription of a DNA fragment placed under their 
control, in particular: 

5 (1) a so-called minimal promoter region comprising the TATA box and the site of 
initiation of transcription, which determines the position of the site of initiation as 
well as the basal level of transcription. In Saccharomyces cerevisiae, the length 
of the minimal promoter region is relatively variable. Indeed, the exact location of 
the TATA box varies from one gene to another and may be situated from -40 to - 

10 120 nucleotides upstream of the site of the initiation (Chen and Struhl, 1985, 

EMBO J., 4, 3273-3280) 
(2) sequences situated upstream of the TATA box (immediately upstream up to 
several hundreds of nucleotides) which make it possible to ensure an effective 
level of transcription either constitutively (relatively constant level of transcription 

15 all along the cell cycle, regardless of the conditions of culture) or in a regulatable 

manner (activation of transcription in the presence of an activator and/or 
repression in the presence of a repressor). These sequences, may be of several 
types: activator, inhibitor, enhancer, inducer, repressor and may respond to 
cellular factors or varied culture conditions. 

20 

Examples of such promoters are the ZZA1 and ZZA2 promoters disclosed in US 
5,641,661, the EF1-a protein promoter and the ribosomal protein S7 gene promoter 
disclosed in WO 97/44470,, the COX 4 promoter and two unknown promoters (SEQ 
ID No: 1 and 2 in the document) disclosed in US 5,952,195. Other useful promoters 
25 include the HSP150 promoter disclosed in WO 98/54339 and the SV40 and RSV 
promoters disclosed in US 4,870,013 as well as the PyK and GAPDH promoters 
disclosed in EP 0 329 203 A1 . 

Synthetic yeast promoters 

30 More preferably the invention employs the use of synthetic promoters. Synthetic 
promoters are often constructed by combining the minimal promoter region of one 
gene with the upstream regulating sequences of another gene. Enhanced promoter 
control may be obtained by modifying specific sequences in the upstream regulating 
sequences, e.g. through substitution or deletion or through inserting multiple copies 

35 of specific regulating sequences. One advantage of using synthetic promoters is that 
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they may be controlled without interfering too much with the native promoters of the 
host cell. 

One such synthetic yeast promoter comprises promoters or promoter elements of 
5 two different yeast-derived genes, yeast killer toxin leader peptide, and amino 
terminus of IL-ip (WO 98/54339). 

Another example of a yeast synthetic promoter is disclosed in US 5,436,136 (Hinnen 
et al), which concerns a yeast hybrid promoter including a 5' upstream promoter 
10 element comprising upstream activation site(s) of the yeast PH05 gene and a 3' 
downstream promoter element of the yeast GAPDH gene starting at nucleotide -300 
to -1 80 and ending at nucleotide -1 of the GAPDH gene. 

Another example of a yeast synthetic promoter is disclosed in US 5,089,398 
15 (Rosenberg et al). This disclosure describes a promoter with the general formula - 
(P.R.(2)-P.R.(1))- 
wherein: 

P.R.(1) is the promoter region proximal to the coding sequence and having the 
transcription initiation site, the RNA polymerase binding site, and including the TATA 
20 box, the CAAT sequence, as well as translational regulatory signals, e.g., capping 
sequence, as appropriate; 

P.R.(2) is the promoter region joined to the 5 -end of P.R.(1) associated with 
enhancing the efficiency of transcription of the RNA polymerase binding region; 

25 In US 4,945,046 (Horii et al) discloses a further example of how to design a 
synthetic yeast promoter. This specific promoter comprises promoter elements 
derived both from yeast and from a mammal. The hybrid promoter consists 
essentially of Saccharomyces cerevisiae PH05 or GAP-DH promoter from which the 
upstream activation site (UAS) has been deleted and replaced by the early 

30 enhancer region derived from SV40 virus. 

Cloning site 

The cloning site in the cassette in the primary vector should be designed so that any 
nucleotide sequence can be cloned into it. 
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The cloning site in the cassette preferably allows directional cloning. Hereby is 
ensured that transcription in a host cell is performed from the coding strand in the 
intended direction and that the translated peptide is identical to the peptide for which 
the original nucleotide sequence codes. 

5 

However according to some embodiments it may be advantageous to insert the 
sequence in opposite direction. According to these embodiments, so-called 
antisense constructs may be inserted which prevent functional expression of specific 
genes involved in specific pathways. Thereby it may become possible to divert 
10 metabolic intermediates from a prevalent pathway to another less dominant 
pathway. 

The cloning site in the cassette may comprise multiple cloning sites, generally 
known as MCS or polylinker sites, which is a synthetic DNA sequence encoding a 
15 series of restriction endonuclease recognition sites. These sites are engineered for 
convenient cloning of DNA into a vector at a specific position and for directional 
cloning of the insert. 

Cloning of cDNA does not have to involve the use of restriction enzymes. Other 
20 alternative systems include but are not limited to: 

- Creator™ Cre-loxP system from Clontech, which uses recombination and loxP 
sites 

- use of Lambda attachment sites (att-X), such as the Gateway™ system from Life 
Technologies. 

25 Both of these systems are directional. 

Terminator 

The role of the terminator sequence is to limit transcription to the length of the 
coding sequence. An optimal terminator sequence is thus one, which is capable of 
30 performing this act in the host cell. 

In prokaryotes, sequences known as transcriptional terminators signal the RNA 
polymerase to release the DNA template and stop transcription of the nascent RNA. 
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In eukaryotes, RNA molecules are transcribed well beyond the end of the mature 
mRNA molecule. New transcripts are enzymatically cleaved and modified by the 
addition of a long sequence of adenylic acid residues known as the poly-A tail. A 
poiyadenyiation consensus sequence is located about 10 to 30 bases upstream 
5 from the actual cleavage site. 

Preferred examples of yeast derived terminator sequences include, but are not 
limited to: ADN1, CYC1. GPD. ADH1 alcohol dehydrogenase. 

Intron 

10 

Optionally, the cassette in the vector comprises an intron sequence, which may be 
located 5* or 3' to the expressible nucleotide sequence. The design and layout of 
introns is well known in the art. The choice of intron design largely depends on the 
intended host cell, in which the expressible nucleotide sequence is eventually to be 
15 expressed. The effects of having intron sequence in the expression cassettes are 
those generally associated with intron sequences. 

Examples of yeast introns can be found in the literature and in specific databases 
such as Ares Lab Yeast Intron Database (Version 2.1) as updated on 15 April 2000. 

20 Earlier versions of the database as well as extracts of the database have been 
published in: "Genome-wide bioinformatic and molecular analysis of introns in 
Saccharomyces cerevisiae." by Spingola M, Grate L, Haussler D, Ares M Jr. (RNA 
1999 Feb;5(2):221-34) and "Test of intron predictions reveals novel splice sites, 
alternatively spliced mRNAs and new introns in meiotically regulated genes of 

25 yeast." by Davis CA, Grate L, Spingola M, Ares M Jr, (Nucleic Acids Res 2000 Apr 
15;28(8): 1700-6). 

Primary vectors (entry vectors) 

30 By the term entry vector is meant a vector for storing and amplifying cDNA or other 
expressible nucleotide sequences using the cassettes according to the present 
invention. The primary vectors are preferably able to propagate in E. coli or any 
other suitable standard host cell. It should preferably be amplifiable and amenable to 
standard normalisation and enrichment procedures. 



wo 02/059297 PCT/DK02/00056 

36 

The primary vector may be of any type of DNA that has the basic requirements of a) 
being able to replicate itself in at least one suitable host organism and b) allows 
insertion of foreign DNA which is then replicated together with the vector and c) 
preferably allows selection of vector molecules that contain insertions of said foreign 
5 DNA. In a preferred embodiment the vector is able to replicate in standard hosts like 
yeasts, and bacteria and it should preferably have a high copy number per host cell. 
It is also preferred that the vector in addition to a host specific origin of replication, 
contains an origin of replication for a single stranded virus, such as e.g. the f1 origin 
for filamentous phages. This will allow the production of single stranded nucleic acid 

10 which may be useful for normalisation and enrichment procedures of cloned 
sequences. A vast number of cloning vectors have been described which are 
commonly used and references may be given to e.g. Sambrook.J; Fritsch, E.F; and 
Maniatis T. (1989) Molecular Cloning: A laboratory manual. Cold Spring Harbour 
Laboratory Press. USA, Netherlands Culturie Collection of Bacteria 

15 fwww,cbs.knaw.nl/NCCB/collection.htm) or Department of Microbial Genetics, 
National Institute of Genetics. Yata 1111 Mishima Shizuoka 411-8540, Japan 
f www.shiqen.nig.ac.ip/cvector/cvector.html) . A few type-examples that are the 
parents of many popular derivatives are M13mp10, pUC18, Lambda gt 10. and 
pYAC4. Examples of primary vectors include but are not limited to M13K07. 

20 pBR322. pUC18. pUC19, pUC118. pUC119, pSP64. pSP65. pGEM-3, pGEM-3Z. 
pGEM-3Zf(-), pGEM-4. pGEM-4Z. 7cAN13. pBluescript II. CHARON 4A, X\ 
CHARON 21A, CHARON 32. CHARON 33, CHARON 34, CHARON 35. CHARON 
40. EMBL3A. >.2001. >.DASH. XFIX, ^tlO, >^t11. :^gt18, >.gt20, Xg\22, XORF8. 
XZAP/R, pJB8, G2RB. pcoslEMBL 

25 

Methods for cloning of cDNA or genomic DNA into a vector are well known In the 
art. Reference may be given to J. Sambrook, E.F. Fritsch, T. Maniatis: Molecular 
Cloning. A Laboratory Manual (2"*^ edition, Cold Spring Harbor Laboratory Press. 
1989). 

30 

One example of a circular model entry vector is described in Figure 3. The vector, 
EVE contains the expression cassette, R1-R2-Spacer-Promoter-Multi Cloning Site- 
Terminator-Spacer-R2-R1 . The vector furthermore contains a gene for ampicillin 
resistance. AmpR, and an origin of replication for E.coli, ColEI . 
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The entry vectors EVE4, EVES, and EVES shown in Figures 4, 5, and 6. These all 
contain Srfl as R1 and AscI as R2. Both of these sites are palindromic and are 
regarded as rare restriction sites having 8 bases In the recognition sequence. The 
vectors furthermore contain the AmpR ampicillin resistance gene, and the C0IEI 
5 origin or replication for E.coli as well as f1, which is an origin of replication for 
filamentous phages, such as M13. EVE4 (Fig. 4) contains the MET25 promoter and 
the ADH1 terminator. Spacer 1 and spacer 2 are short sequences deriving from the 
multiple cloning site, MCS. EVES (Fig. S) contains the CUP1 promoter and the 
ADH1 terminator. EVES (Fig. 6) contains the CUP1 promoter and the ADH1 
10 terminator. The spacers of EVES are a 5S0 bp lambda phage DNA (spacer 3) and 
an ARS sequence from yeast (spacer 4). 

Nucleotide library (entry library) 

1 S Methods as well as suitable vectors and host cells for constructing and maintaining 
a library of nucleotide sequences in a cell are well known in the art. The primary 
requirement for the library is that is should be possible to store and amplify in it a 
number of primary vectors (constructs) according to this invention, the vectors 
(constructs) comprising expressible nucleotide sequences from at least one 

20 expression state and wherein at least two vectors (constructs) are different 

One specific example of such a library is the well known and widely employed cDNA 
libraries. The advantage of the cDNA library is mainly that it contains only DNA 
sequences corresponding to transcribed messenger RNA in a cell. Suitable methods 
2S are also present to purify the isolated mRNA or the synthesised cDNA so that only 
substantially full-length cDNA is cloned into the library. 

Methods for optimisation of the process to yield substantially full length cDNA may 
comprise size selection, e.g. electrophoresis, chromatography, precipitation or may 
30 comprise ways of increasing the likelihood of getting full length cDNAs, e.g. the 
SMART^M method (Clonetech) or the CapTrap™ method (Stratagene). 

Preferably the method for making the nucleotide library comprises obtaining a 
substantially full length cDNA population comprising a normalised representation of 
3S cDNA species. More preferably a substantially full length cDNA population 
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comprises a normalised representation of cDNA species characteristic of a given 
expression state. 

. Normalisation reduces the redundancy of clones representing abundant mRNA 
5 species and increases the relative representation of clones from rare mRNA 
species. 

Methods for normalisation of cDNA libraries are well known in the art. Reference 
may be given to suitable protocols for normalisation such as those described in US 
10 5.763,239 (DIVERSA) and WO 95/08647 and WO 95/1 1986. and Bonaldo, Lennon. 
Scares, Genome Research 1996. 6:791-806; All, Holloway, Taylor, Plant Mol Biol 
Reporter, 2000. 18:123-132. 

Enrichment methods are used to isolate clones representing mRNA which are 
15 characteristic of a particular expression state. A number of variations of the method 
broadly termed as subtractive hybrisation are known in the art. Reference may be 
given to Sive. John. Nucleic Acid Res, 1988, 16:10937; Diatchenko, Lau, Campbell 
et al, PNAS, 1996. 93:6025-6030; Carninci. Shibata, Hayatsu. Genome Res, 2000. 
10:1617-30. Bonaldo, Lennon, Scares. Genome Research 1996. 6:791-806; Ali, 
20 Holloway, Taylor, Plant Mol Biol Reporter. 2000, 18:123-132. For example, 
enrichment may be achieved by doing additional rounds of hybridization similar to 
normalization procedures, using e.g. cDNA from a library of abundant clones or 
simply a library representing the uninduced state as a driver against a tester library 
from the induced state. Alternatively mRNA or PGR amplified cDNA derived from the 
25 expression state of choice can be used to subtract common sequences from a tester 
library. The choice of driver and tester population wilt depend on the nature of target 
expressible nucleotide sequences in each particular experiment 

In the library an expressible nucleotide sequence coding for one peptide is 
30 preferably found in different but similar vectors under the control of different 
promoters. Preferably the library comprises at least three primary vectors with an 
expressible nucleotide sequence coding for the same peptide under the control of 
three different promoters. More preferably the library comprises at least four primary 
vectors with an expressible nucleotide sequence coding for the same peptide under 
35 the control of four different promoters. More preferably the library comprises at least 



wo 02/059297 



PCT/DK02/00056 



39 

five primary vectors with an expressible nucleotide sequence coding for the same 
peptide under the control of five different promoters, such as comprises at lest six 
primary vectors with an expressible nucleotide sequence coding for the same 
peptide under the control of six different promoters, for example comprises at least 
5 seven primary vectors with an expressible nucleotide sequence coding for the same 
peptide under the control of seven different promoters, for example comprises at 
least eight primary vectors with an expressible nucleotide sequence coding for the 
same peptide under the control of eight different promoters, such as comprises at 
least nine primary vectors with an expressible nucleotide sequence coding for the 
10 same peptide under the control of nine different promoters, for example comprises 
at least ten primary vectors with an expressible nucleotide sequence coding for the 
same peptide under the control of ten different promoters. 

The expressible nucleotide sequence coding for the same peptide preferably 
15 comprises essentially the same nucleotide sequence, more preferably the same 
nucleotide sequence. 

By having a library with what may be termed one gene under the control of a 
number of different promoters in different vectors, it is possible to construct from the 
20 nucleotide library an array of combinations of genes and promoters. Preferably, one 
library comprises a complete or substantially complete combination such as a two 
dimensional array of genes and promoters, wherein substantially all genes are found 
under the control of substantially all of a selected number of promoters. 

25 According to another embodiment of the invention the nucleotide library comprises 
combinations of expressible nucleotide sequences combined in different vectors 
with different spacer sequences and/or different intron sequences. Thus any one 
expressible nucleotide sequence may be combined in a two, three, four or five 
dimensional array with different promoters and/or different spacers and/or different 

30 introns and/or different terminators. The two, three, four or five dimensional array 
may be complete or incomplete, since not all combinations will have to be present. 

The library may suitably be maintained in a host cell comprising prokaryotic cells or 
eukaryotic cells. Preferred prokaryotic host organisms may include but are not 
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limited to Escherichia coli, Bacillus subtilis, Streptomyces lividans, Streptomyces 
coelicoior Pseudomonas aeruginosa, Myxococcus xanthus. 

Yeast species such as Saccharomyces cerevisiae (budding yeast), 
5 Schizosaccharomyces pombe (fission yeast), Pichia pastoris, and Hansenula 
polymorpha (methylotropic yeasts) may also be used. Filamentous ascomycetes. 
such as Neurospora crassa and Aspergillus nidulans may also be used. Plant cells 
such as those derived from Nicotiana and Arabidopsis are preferred. Preferred 
mammalian host cells include but are not limited to those derived from humans. 
10 monkeys and rodents, such as Chinese hamster ovary (CHO) cells, NIH/3T3, COS, 
293. VERO, HeLa etc (see Kriegler M. in "Gene Transfer and Expression: A 
Laboratory Manual", New York, Freeman & Co. 1990). 



Concatemers 

15 

A concatemer is a series of linked units. In the present context a concatemer is used 
to denote a number of serially linked nucleotide cassettes, wherein at least two of 
the serially linked nucleotide units comprises a cassette having the basic structure 
[rs2-SP-PR-X-TR-SP-rsi] 
20 wherein 

rsi and rs2 together denote a restriction site, 
SP individually denotes a spacer of at least two nucleotide bases, 
PR denotes a promoter, capable of functioning in a cell, 
X denotes an expressible nucleotide sequence, 
25 TR denotes a terminator, and 

SP individually denotes a spacer of at least two nucleotide bases. 

Optionally the cassettes comprise an intron sequence between the promoter and the 
expressible nucleotide sequence and/or between the terminator and the expressible 
30 sequence. 

The expressible nucleotide sequence in the cassettes of the concatemer may 
comprise a DNA sequence selected from the group comprising cDNA and genomic 
DNA. 



35 
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According to one aspect of the invention, a concatemer comprises cassettes witin 
expressible nucleotide from different expression states, so that non-naturally 
occurring combinations or non-native combinations of expressible nucleotide 
sequences are obtained. These different expression states may represent at least 
5 two different tissues, such as at least two organs, such as at least two species, such 
as at least two genera. The different species may be from at least two different 
phylae, such as from at least two different classes, such as from at least two 
different divisions, more preferably from at least two different sub-kingdoms, such as 
from at least two different kingdoms. 

10 

For example, the expressible nucleotide sequences may originate from eukaryots 
such as mammals such as humans, mice or whale, from reptiles such as snakes 
crocodiles or turtles, from tunicates such as sea squirts, from lepidoptera such as 
butterflies and moths, from coelenterates such as jellyfish, anenomes, or corals, 

15 from fish such as bony and cartilaginous fish, from plants such as dicots, e.g. coffee, 
oak or monocots such as grasses, lilies, and orchids; from lower plants such as 
algae and gingko, from higher fungi such as terrestrial fruiting fungi, from marine 
actinomycetes. The expressible nucleotide sequences may also originate from 
protozoans such as malaria or trypanosomes, or from prokaryotes such as E. coli or 

20 archaebacteria. Furthermore, the expressible nucleotide sequences may originate 
from one or more preferably from more expression states from the species and 
genera listed in the table below. 



25 Bacteria Streptomyces . Micromonospora, Norcadia, Actinomadura. Actinoplanes, 

Streptosporangium, Microbispora, Kitasatosporiam. Azobacterium. Rhizobium, 
Achromobacterium, Enterobacterium, Brucella, Micrococcus, Lactobacillus, Bacillus 
(B.t. toxins), Clostridium (toxins), Brevi bacterium, Pseudomonas, Aerobacter, Vibrio, 
Halobacterium, Mycoplasma, Cytophaga. Myxococcus 

30 

Fungi Amanita muscaria (fly agaric, ibotenic acid, muscimol), Psilocybe (psilocybin) 

Physarlum, Fuligo. Mucor. Phytophtora, Rhizopus, Aspergillus, Penlcillium 
(penicillin), Coprinus. Phanerocliaete, Acremonium (Cephalosporin), Trochoderma, 
Helminthosporium. Fusarium, Altemaria, Myrothecium, Saccharomyces 

35 

Algae Digenea simplex (kalnic acid, antihelminthic), Laminaria anqustata (laminine. 

hypotensive) 
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Lichens 



42 

Usnea fasclata (vulpinicacid, antimicrobial; usnic acid, antitumor) 



10 



Higher Plants Artemisia (artemisinin). Coleus (forskolin), Desmodlum (K channel agonist), 
Catharanthus (Vinca alkaloids), Digitalis (cardiac glycosides), Podophyllum 
(podophyllotoxin), Taxus (taxol), Cephalotaxus (homoharringtonine). Camptotheca 
(Camptothecin), Camellia sinensis (Tea), Cannabis indica, Cannabis sativa (Hemp), 
Erythroxylum coca (Coca), Lophophora williamsil (PeyoteMyrlstica fragrans 
(Nutmeg), Nicotiana. Papaver somniferum (Opium Poppy), Phalaris arundinacea 
(Reed canary grass) 



Protozoa 



Ptychodlscus brevis; Dinoflagellates (brevitoxin, cardiovascular) 



15 



Sponges MIcrociona prollfera (ectyonin, antimicrobial) Cryptotethya cryta (D-arabino 

furanosides) 

Coelenterata Portuguese Man o War & other jellyfish and medusoid toxins. 



20 



Corals 



Pseudoterogonia species (Pseudoteraclns, anti-Inflammatory), Erythropodium 
(erythrolldes, anti-inflammatory) 



Aschelminths Nematode secretory compounds 



Molluscs 



Conus toxins, sea slug toxins, cephalapod neurotransmitters, squid inks 



25 



Annelida 



Lumbriconereis heteropa (nereistoxin, insecticidal) 



Arachnids 



Dolomedes ("fishing spider" venoms) 



30 



Crustacea Xenobalanus (skin adhesives) 



Insects Epilachna (mexican bean beetle alkaloids) 



Spinunculida Bonellia viridis (bonellin.neuroactive) 



35 



Bryozoans 



Bugula neritina (bryostatins,anti cancer) 



Echinoderms Crinoid chemistry 



40 



Tunicates Trididemnum solidum (didemnin.anti-tumor and anti-viral; Ecteinascidia turbinate 

ecteinascidins, anti-tumor) 



Vertebrates Eptatretus stoutii (eptatretin,cardioactive). Trachinus draco (proteinaceous toxins, 
reduce blood pressure, respiration and reduce heart rate). Dendrobatid frogs 
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(batrachotoxins, pumiliotoxins, histrionicotoxins, and other polyamines); Snake 
venom toxins; Orinthorhynohus anatinus (duck-billed platypus venom), modified 
carotenbids, retinoids and steroids; Avians: histrionicotoxins, modified carotenoids, 
retinoids and steroids 

5 

According to a preferred embodiment of the invention the concatemer comprises at 
least a first cassette and a second cassette, said first cassette being different from 
said second cassette. More preferably, the concatemer comprises cassettes, 
wherein substantially all cassettes are different. The difference between the 
10 cassettes may arise from differences between promoters, and/or expressible 
nucleotide sequences, and/or spacers, and/or terminators, and/or introns. 

The number of cassettes in a single concatemer is largely determined by the host 
species into which the concatemer is eventually to be inserted and the vector 

15 through which the insertion is carried out. The concatemer thus may comprise at 
least 10 cassettes, such as at least 15, for example at least 20, such as at least 25, 
for example at least 30, such as from 30 to 60 or more than 60, such as at least 75, 
for example at least 100, such as at least 200, for example at least 500, such as at 
least 750, for example at least 1000, such as at least 1500. for example at least 

20 2000 cassettes. 

Each of the cassettes may be laid out as described above. 

Once the concatemer has been assembled or concatenated it may be ligated into a 
25 suitable vector. Such a vector may advantageously comprise an artificial 
chromosome. The basic requirements for a functional artificial chromosome have 
been described in US 4,464,472, the contents of which is hereby incorporated by 
reference. An artificial chromosome or a functional minichromosome, as it may also 
be termed must comprise a DNA sequence capable of replication and stable mitotic 
30 maintenance in a host cell comprising a DNA segment coding for centromere-like 
activity during mitosis of said host and a DNA sequence coding for a replication site 
recognized by said host. 

Suitable artificial chromosomes include a Yeast Artificial Chromosome (YAC) (see 
35 e.g. Murray et al. Nature 305:189-193; or US 4,464,472), a mega Yeast Artificial 
Chromosome (mega YAC), a Bacterial Artificial Chromosome (BAC), a mouse 
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artificial cliromosome, a jVlammaiian Artificial Chromosome (IVIAC) (see e.g. US 
6,133.503 or US 6,077.697), an Insect Artificial Chromosome (BUGAC). an Avian 
Artificial Chromosome (AVAC), a Bacteriophage Artificial Chromosome, a 
Baculovirus Artificial Chromosome, a plant artificial chromosome (US 5.270,201), a 
5 BIBAC vector (US 5,977,439) or a Human Artificial Chromosome (HAC). 

The artificial chromosome is preferably so large that the host cell perceives it as a 
"real" chromosome and maintains it and transmits it as a chromosome. For yeast 
and other suitable host species, this will often correspond approximately to the size 
10 of the smallest native chromosome in the species. For Saccharomyces. the smallest 
chromosome has a size of 225 Kb. 

MACS may be used to construct artificial chromosomes from other species, such as 
insect and fish species. The artificial chromosomes preferably are fully functional 
15 stable chromosomes. Two types of artificial chromosomes may be used. One type, 
referred to as SATACs [satellite artificial chromosomes] are stable heterochromatic 
chromosomes, and the other type are minichromosomes based on amplification of 
euchromatin. 

20 Mammalian artificial chromosomes provide extra-genomic specific integration sites 
for introduction of genes encoding proteins of interest and permit megabase size 
-DNA integration, such as integration of concatemers according to the invention. 

According to another embodiment of the invention, the concatemer may be 
25 integrated into the host chromosomes or cloned into other types of vectors, such as 
a plasmid vector, a phage vector, a viral vector or a cosmid vector. 

A preferable artificial chromosome vector is one that is capable of being 
conditionally amplified in the host cell, e.g. in yeast. The amplification preferably is at 
30 least a 10 fold amplification. Furthermore, it is advantageous that the cloning site of 
the artificial chromosome vector can be modified to comprise the same restriction 
site as the one bordering the cassettes described above, i.e. RS2 and/or RS2'. 
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Concatenation 

Cassettes to be concatenated are riomnally excised from a vector either by digestion 
with restriction enzymes or by PCR. After excision the cassettes may be separated 
5 from the vector through size fractionation such as gel filtration or through tagging of 
known sequences in the cassettes. The isolated cassettes may then be joined 
together either through interaction between sticky ends or through ligation of blunt 
ends. 

10 Single-stranded compatible ends may be created by digestion with restriction en- 
zymes. For concatenation a preferred enzyme for excising the cassettes would be a 
rare cutter, i.e. an enzyme that recognises a sequence of 7 or more nucleotides. 
Examples of enzymes that cut very rarely are the meganucleases, many of which 
are intron encoded, like e.g. I-Ceu I. I-Sce I, l-Ppo I, and Pl-Psp I (see eample 6d for 

1 5 more). Other preferred enzymes recognize a sequence of 8 nucleotides like e.g. Asc 
I, AsiS I, CciN I. CspB I, Fse I. MchA I, Not I, Pac I, Sbf I. Sda I, Sgf I, SgrA I. 
Sse232 I, and Sse8387 I, all of which create single stranded, palindromic compatible 
ends. 

20 Other preferred rare cutters, which may also be used to control orientation of 
individual cassettes in the concatemer are enzymes that recognize non-palindromic 
sequences like e.g. Aar I, Sap I, Sfi I. Sdi I, and Vpa (see example 6c for more). 

Alternatively, cassettes can be prepared by the addition of restriction sites to the 
25 ends, e.g. by PCR or ligation to linkers (short synthetic dsDNA molecules). 
Restriction enzymes are continuously being isolated and characterised and it is 
anticipated that many of such novel enzymes can be used to generate single- 
stranded compatible ends according to the present invention. 

30 It is conceivable that single stranded compatible ends can be made by cleaving the 
vector with synthetic cutters. Thus, a reactive chemical group that will normally be 
able to cleave DNA unspecifically can cut at specific positions when coupled to 
another molecule that recognises and binds to specific sequences. Examples of 
molecules that recognise specific dsDNA sequences are DNA, PNA. LNA, 

35 phosphothioates, peptides, and amides. See e.g. Armitage, B.(1998) Chem. Rev. 
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98: 1171-1200, who describes photocleavage using e.g. anthraquinone and UV 
liglit; Dervan P.B. & Burii R.W. (1999) Curr. Opin. Chem. Biol. 3: 688-93 describes 
the specific . binding of polyamides to DNA; Nielsen, P.E. (2001) Curr. Opin. 
Biotechnol. 12: 16-20 describes the specific binding of PNA to DNA, and Chemical 
5 Reviews special thematic issue: RNA/DNA Cleavage (1998) vol. 98 (3) Bashkin J.K. 
(ed.) ACS publications, describes several examples of chemical DNA cleavers. 

Single-stranded compatible ends may also be created by using e.g. PCR primers 
including dUTP and then treating the PCR product with Uracil-DNA glycosylase 
10 (Ref: US 5,035,996) to degrade part of the primer. Alternatively, compatible ends 
can be created by tailing both the vector and insert with complimentary nucleotides 
using Terminal Transferase (Chang, LMS. Bollum TJ (1971) J Biol Chem 246:909). 

It is also conceivable that recombination can be used to generate concatemers, e.g. 

1 5 through the modification of techniques like the Creator™ system (Clontech) which 
uses the Cre-loxP mechanism (Sauer B 1993 Methods Enzymol 225:890-900) to 
directionally join DNA molecules by recombination or like the Gateway™ system 
(Life Technologies, US 5,888,732) using lambda att attachment sites for directional 
recombination (Landy A 1989, Ann Rev Biochem 58:913). It is envisaged that also 

20 lambda cos site dependent systems can be developed to allow concatenation. 

More preferably the cassettes may be concatenated without an intervening 
purification step through excision from a vector with two restriction enzymes, one 
leaving sticky ends on the cassettes and the other one leaving blunt ends in the 
25 vectors. This is the preferred method for concatenation of cassettes from vectors 
having the basic structure of [RS1-RS2-SP-PR-X-TR-SP-RS2'-RSr]. 

An alternative way of producing concatemers free of vector sequences would be to 
PCR amplify the cassettes from a single stranded primary vector. The PCR product 
30 must include the restriction sites RS2 and RS2' which are subsequently cleaved by 
its cognate enzyme(s). Concatenation can then be performed using the digested 
PCR product, essentially without interference from the single stranded primary 
vector template or the small double stranded fragments, which have been cut from 
the ends. 



35 
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The concatemer may be assembled or concatenated by concatenation of at least 
two cassettes of nucleotide sequences each cassette comprising a first sticky end, a 
spacer sequence, a promoter, an expressible nucleotide sequence, a terminator, a 
spacer sequence, and a second sticl<y end. A flow chart of the procedure is shown 
5 in figure 2a. 

Preferably concatenation further comprises 

starting from a primary vector [RS1-RS2-SP-PR-X-TR-SP-RS2 -RSr], 
wherein X denotes an expressible nucleotide sequence, 
10 RS1 and RSV denote restriction sites, 

RS2 and RS2' denote restriction sites different from RS1 and RS1', 
SP individually denotes a spacer sequence of at least two nucleotides, 
PR denotes a promoter, 
TR denotes a terminator, 
15 i) cutting the primary vector with the aid of at least one restriction 

enzyme specific for RS2 and RS2' obtaining cassettes having the 
general formula [rs2-SP-PR-X-TR-SP-rsi] wherein rsi and rs2 together 
denote a functional restriction site RS2 or RS2', 
ii) assembling the cut out cassettes through interaction between rsi and 
20 rs2. 

In this way at least 10 cassettes can be concatenated, such as at least 15, for 
example at least 20, such as at least 25, for example at least 30, such as from 30 to 
60 or more than 60, such as at least 75, for example at least 100, such as at least 
25 200, for example at least 500, such as at least 750, for example at least 1000, such 
as at least 1 500, for example at least 2000. 

According to an especially preferred embodiment, vector arms each having a RS2 
or RS2' in one end and a non-complementary overhang or a blunt end in the other 

30 end are added to the concatenation mixture together with the cassettes described 
above to further simplify the procedure (see Fig. 2b). One example of a suitable 
vector for providing vector arms is disclosed in Fig. 7 TRPi . URA3. and HIS3 are 
auxotrophic marker genes, and AmpR is an E. coli antibiotic marker gene. CEN4 is 
a centromer and TEL are telomeres. ARS1 and PMB1 allow replication in yeast and 

35 E. coli respectively. BamH I and Asc I are restriction enzyme recognition sites. The 
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nucleotide sequence of the vector is set forth in SEQ ID NO 4. The vector is 
digested with BamHI and AscI to liberate the vector arms, which are used for ligation 
to the concatemer. 

5 The ratio of vector arms to cassettes determines the maximum number of cassettes 
in the concatemer as illustrated in figure 8. The vector arms preferably are artificial 
chromosome vector arms such as those described in Fig. 7. 

It is of course also possible to add stopper fragments to the concatenation solution, 
10 the stopper fragments each having a RS2 or RS2' in one end and a non- 
complementary overhang or a blunt end in the other end. The ratio of stopper 
fragments to cassettes can likewise control the maximum size of the concatemer. 

The complete sequence of steps to be taken when starting with the isolation of 
15 mRNA until inserting into an entry vector may include the following steps 

i) isolating mRNA from an expression state, 

ii) obtaining substantially full length cDNA corresponding to the mRNA 
sequences, 

ill) inserting the substantially full length cDNA into a cloning site in a 
20 cassette in a primary vector, said cassette being of the general 

formula in 5'->3' direction: 
[RSI -RS2-SP-PR-CS-TR-SP-RS2'-RS 1 '] 
wherein CS denotes a cloning site. 

25 In preparation of the concatemer, genes may be isolated from different entry 
libraries to provide the desired selection of genes. Accordingly, concatenation may 
further comprise selection of vectors having expressible nucleotide sequences from 
at least two different expression states, such as from two different species. The two 
different species may be from two different classes, such as from two different 

30 divisions, more preferably from two different sub-kingdoms, such as from two 
different kingdoms. 

As an alternative to including vector arms in the concatenation reaction it is possible 
to ligate the concatemer into an artificial chromosome selected from the group 
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comprising yeast artificial chromosome, mega yeast artificial chromosome, bacterial 
artificial chromosome, mouse artificial chromosome, human artificial chromosome. 

Preferably at least one inserted concatemer further comprises a selectable marker. 
5 The marker(s) are conveniently not included in the concatemer as such but rather in 
an artificial chromosome vector, into which the concatemer is inserted. Selectable 
markers generally provide a means to select, for growth, only those cells which 
contain a vector. Such markers are of two types: drug resistance and auxotrophy. A 
drug resistance marker enables cells to grow in the presence of an otherwise toxic 
10 compound. Auxotrophic markers allow cells to grow in media lacking an essential 
component by enabling cells to synthesise the essential component (usually an 
amino acid). 

Illustrative and non-llmitjng examples of common compounds for which selectable 
1 5 markers are available with a brief description of their mode of action follow: 

Prokaryotic 

• Ampicillin: interferes with a terminal reaction in bacterial cell wall synthesis. 
The resistance gene (bla) encodes beta-lactamase which cleaves the beta- 
lactam ring of the antibiotic thus detoxifying it. 

20 • Tetracycline: prevents bacterial protein synthesis by binding to the 30S 

ribosomal subunit. The resistance gene (tet) specifies a protein that modifies 
the bacterial membrane and prevents accumulation of the antibiotic in the 
cell. 

• Kanamycin: binds to the 70S ribosomes and causes misreading of 
25 messenger RNA. The resistant gene (nptH) modifies the antibiotic and 

prevents interaction with the ribosome. 

• Streptomycin: binds to the SOS ribosomal subunit, causing misreading of 
messenger RNA. The resistance gene (Sm) modifies the antibiotic and 
prevents interaction with the ribosome. 

30 • Zeocin: this new bleomycin-family antibiotic intercalates into the DNA and 

cleaves it. The Zeocin resistance gene encodes a 1 3.665 dalton protein. This 
protein confers resistance to Zeocin by binding to the antibiotic and 
preventing it from binding DNA. Zeocin is effective on most aerobic cells and 
can be used for selection in mammalian cell lines, yeast, and bacteria. 
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Eukaryotic 

• Hygromycin: a aminocyclitol that inhibits protein synthesis by disrupting 
ribosome translocation and promoting mistranslation. The resistance gene 
(hph) detoxifies hygromycin -B- phosphorylation. 

5 • Histidinol: cytotoxic to mammalian cells by inhibiting histidyl-tRNA synthesis 

in histidine free media. The resistance gene (hisD) product inactivates 
histidinol toxicity by converting it to the essential amino acid, histidine. 

• Neomycin (G418): blocks protein synthesis by interfering with ribosomal 
functions. The resistance gene ADH encodes amino glycoside 

10 phosphotransferase which detoxifies G418. 

• Uracil: Laboratory yeast strains carrying a mutated gene which encodes 
orotidine -5*- phosphate decarboxylase, an enzyme essential for uracil 
biosynthesis, are unable to grow in the absence of exogenous uracil. A copy 
of the wild-type gene (ura4+, S. pombe or URA3 S. cerevisiae) carried on 

15 the vector will complement this defect in transformed cells. 

• Adenosine: Laboratory strains carrying a deficiency in adenosine synthesis 
may be complemented by a vector carrying the wild type gene, ADE 2. 

• Amino acids: Vectors carrying the wild-type genes for LEU2, TRP 1, HIS 3 or 
LYS 2 may be used to complement strains of yeast deficient in these genes. 

20 • Zeocin: this new bleomycin-family antibiotic intercalates into the DNA and 

cleaves it. The Zeocin resistance gene encodes a 13,665 dalton protein. This 
protein confers resistance to Zeocin by binding to the antibiotic and 
preventing it from binding DNA. Zeocin is effective on most aerobic cells and 
can be used for selection in mammalian cell lines, yeast, and bacteria. 



25 



Transgenic cells 



In one aspect of the invention, the concatemers comprising the multitude of 
cassettes are introduced into a host cell, in which the concatemers can be 
30 maintained and the expressible nucleotide sequences can be expressed in a co- 
ordinated way. The cassettes comprised in the concatemers may be isolated from 
the host cell and re-assembled due to their uniform structure with -preferably - 
concatemer restriction sites between the cassettes. 
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The host cells selected for this purpose are preferably cultivable under standard 
laboratory conditions using standard culture conditions, such as standard media and 
protocols. Preferably the host cells comprise a substantially stable cell line, in which 
the concatemers can be maintained for generations of cell division. Standard 
5 techniques for transformation of the host cells and in particular methods for insertion 
of artificial chromosomes into the host cells are known. 

It is also of advantage if the host cells are capable of undergoing meiosis to perform 
sexual recombination. It is also advantageous that meiosis is controllable through 
10 external manipulations of the cell culture. One especially advantageous host cell 
type is one where the cells can be manipulated through external manipulations into 
different mating types. 

The genome of a number of species have already been sequenced more or less 
15 completely and the sequences can be found in databases. The list of species for 
which the whole genome has been sequenced increases constantly. Preferably the 
host cell is selected from the group of species, for which the whole genome or 
essentially the whole genome has been sequenced. The host cell should preferably 
be selected from a species that is well described in the literature with respect to 
20 genetics, metabolism, physiology such as model organism used for genomics 
research. 

The host organism should preferably be conditionally deficient in the abilities to 
undergo homologous recombination. The host organism should preferably have a 
25 codon usage similar to that of the donor organisms. Furthermore, in the case of 
genomic DNA, if eukaryotic donor organisms are used, it is preferable that the host 
organism has the ability to process the donor messenger RNA properly, e.g., splice 
out introns. 

30 The host cells can be bacterial, archaebacteria, or eukaryotic and can constitute a 
homogeneous cell line or mixed culture. Suitable cells include the bacterial and 
eukaryotic cell lines commonly used in genetic engineering and protein expression. 

Preferred prokaryotic host organisms may include but are not jimited to Escherichia 
35 coli. Bacillus subtilis, B licehniformis, B. cereus, Streptomyces lividans. 
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Streptomyces coelicolor, Pseudomonas aeruginosa, Myxococcus xanthus. 
Rhodococcus. Streptomycetes, Actinomycetes, Corynebacteria. Bacillus, 
Pseudomonas, Salmonella, and Erwinia. The complete genome sequences of E. 
coli and Bacillus subtilis are described by Biattner et al,, Science 277, 1454-1462 
5 (1997); Kunst et al.. Nature 390, 249-256 (1997)). 

Preferred eukaryotic host organismis are mammals, fish, insects, plants, algae and 
fungi. 

10 Examples of mammalian cells include those from, e.g., monkey, mouse, rat, 
hamster, primate, and human, both cell lines and primary cultures. Preferred 
mammalian host cells include but are not limited to those derived from humans, 
monkeys and rodents, such as Chinese hamster ovary (CHO) cells, NIH/3T3, COS, 
293, VERO. HeLa etc (see Kriegler M. in "Gene Transfer and Expression: A 

15 Laboratory Manual", New York, Freeman & Co. 1990), and stem cells, including 
embryonic stem cells and hemopoietic stem cells, zygotes, fibroblasts, lymphocytes, 
kidney, liver, muscle, and skin cells. 

Examples of insect cells include baculo lepidoptera. 

20 

Examples of plant cells include maize, rice, wheat, cotton, soybean, and sugarcane. 
Plant cells such as those derived from Nicotiana and Arabidopsis are preferred 

Examples of fungi include penicillium, aspergillus, such as Aspergillus nidulans, 
25 podospora, neurospora, such as Neurospora crassa, saccharomyces, such as 
Saccharomyces cerevisiae (budding yeast), Schizoisaccharomyces, such as 
Schizosaccharomyces pombe (fission yeast). Pichia spp, such as Pichia pastoris, 
and Hansenula polymorpha (methylotropic yeasts). 

30 In a preferred embodiment the host cell is a yeast cell, and an illustrative and not 
limiting list of suitable yeast host cells comprise: baker's yeast, Kluyveromyces 
marxianus. K. lactis, Candida utilis. Phaffia rhodozyma, Saccharomyces boulardji, 
Pichia pastoris. Hansenula polymorpha. Yarrowia lipolytica. Candida paraffinica, 
Schwanniomyces castellii, Pichia stipitis, Candida shehatae, Rhodotorula glutinis, 

35 Lipomyces lipofer, Cryptococcos curvatus, Candida spp. (e.g. C. palmioleophila). 
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Yarrowia lipolytica, Candida guilliermondii, Candida, Rhodotorula spp.. 
Sacciiaromycopsis spp., Aureobasidium pullulans, Candida brumptii. Candida 
liydrocarbofumarica, Torulopsis. Candida tropicalis, Saccharomyces cerevisiae. 
Rhodotorula rubra, Candida flaveri, Eremothecium ashbyii, Pichia spp., Pichia 
5 pastoris, Kluyveromyces, Hansenula, Kloecl<era, Pichia, Pachysolen spp., or 
Torulopsis bomblcola. 

The choice of host will depend on a number of factors, depending on the intended 
use of the engineered host, including pathogenicity, substrate range, environmental 
10 hardiness, presence of key intermediates, ease of genetic manipulation, and 
likelihood of promiscuous transfer of genetic information to other organisms. 
Particularly advantageous hosts are E. coli, lactobacilli, Streptomycetes, 
Actinomycetes. Saccharomyces and filamentous fungi. 

15 In any one host cell it is possible to make all sorts of combinations of expressible 
nucleotide sequences from all possible sources. Furthennore, it is possible to make 
combinations of promoters and/or spacers and/or introns and/or terminators in com- 
bination with one and the same expressible nucleotide sequence. 

20 Thus in any one cell there may be expressible nucleotide sequences from two dif- 
ferent expression states. Furthermore, these two different expression states may be 
from one species or advantageously from two different species. Any one host cell 
may also comprise expressible nucleotide sequences from at least three species, 
such as from at least four, five, six, seven, eight, nine or ten species, or from more 

25 than 1 5 species such as from more than 20 species, for example from more than 30, 
40 or 50 species, such as from more than 100 different species, for example from 
more than 300 different species, such as form more than 500 different species, for 
example from more than 1000 different species, thereby obtaining combinations of 
large numbers of expressible nucleotide sequences from a large number of species. 

30 In this way potentially unlimited numbers of combinations of expressible nucleotide 
sequences can be combined across different expression states. These different ex- 
pression states may represent at least two different tissues, such as at least two 
organs, such as at least two species, such as at least two genera. The different spe- 
cies may be from at least two different phylae, such as from at least two different 
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classes, such as from at least two different divisions, more preferably from at least 
two different sub-kingdoms, such as from at least two different kingdoms. 

Any two of these species may be from two different classes, such as from two dif- 
5 ferent divisions, more preferably from two different sub-kingdoms, such as from two 
different kingdoms. Thus expressible nucleotide sequences may be combined from 
a eukaryot and a prokaryot into one and the same cell. 

According to another embodiment of the invention, the expressible nucleotide se- 
10 quences may be from one and the same expression state. The products of these 
sequences may interact with the products of the genes in the host cell and form new 
enzyme combinations leading to novel biochemical pathways. Furthermore, by put- 
ting the expressible nucleotide sequences under the control of a number of promot- 
ers it becomes possible to switch on and off groups of genes in a co-ordinated man- 
15 ner. By doing this with expressible nucleotide sequences from only one expression 
states, novel combinations of genes are also expressed. 

The number of concatemers in one single cell may be at least one concatemer per 
cell, preferably at least 2 concatemers per cell, more preferably 3 per cell, such as 4 

20 per cell, more preferably 5 per cell, such as at least 5 per cell, for example at least 6 
per cell, such as 7, 8, 9 or 10 per cell, for example more than 10 per cell. As 
described above, each concatemer may preferably comprise up to 1 000 cassettes, 
and it is envisages that one concatemer may comprise up to 2000 cassettes. By 
inserting up to 10 concatemers into one single cell, this cell may thus be enriched 

25 with up to 20,000 heterologous expressible genes, which under suitable conditions 
may be turned on and off by regulation of the regulatable promoters. 



Often it is more preferable to provide cells having anywhere between 10 and 1000 
heterologous genes, such as 20-900 heterologous genes, for example 30 to 800 

30 heterologous genes, such as 40 to 700 heterologous genes, for example 50 to 600 
heterologous genes, such as from 60 to 300 heterologous genes or from 100 to 400 
heterologous genes which are inserted as 2 to 4 artificial chromosomes each 
containing one concatemer of genes. The genes may advantageously be located on 
1 to 10 such as from 2 to 5 different concatemers in the cells. Each concatemer may 

35 advantageously comprise from 10 to 1000 genes, such as from 10 to 750 genes, 
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such as from 10 to 500 genes, such as from 10 to 200 genes, such as from 20 to 
100 genes, for example from 30 to 60 genes, or from 50 to 100 genes. 

The concatemers may be inserted into the host cells according to any known 
5 transformation technique, preferably according to such transformation techniques 
that ensure stable and not transient transformation of the host cell. The concatemers 
. may thus be inserted as an artificial chromosome which is replicated by the cells as 
they divide or they may be inserted into the chromosomes of the host cell. The 
concatemer may also be inserted in the form of a plasmid such as a plasmid vector. 

10 a phage vector, a viral vector, a cosmid vector, that is replicated by the ceils as they 
divide. Any combination of the three insertion methods is also possible. One or more 
concatemers may thus be integrated into the chromosome{s) of the host cell and 
one or more concatemers may be inserted as plasmids or artificial chromosomes. 
One or more concatemers may be inserted as artificial chromosomes and one or 

15 more may be inserted into the same cell via a plasmid. 



Examples 
Example 1 

20 

In the examples 1-3 an AscI site was introduced into the EcoRI site in pYAC4 
(Sigma, Burke DT et al. 1987, Science vol 236, p 806), . so that sticky ends match the 
AscI site( = RS2 in general formula of this patent) of the cassettes in pEVE vectors 



25 Preparation of EVACs (EVolvabie Artificial Chromosomes) including size frac 
tioning 

preparation of pYAC4-Asc arms 

1 . inoculate 150 ml of LB (sigma) with a single colony of E. coli DH5a containing 
pYAC4-Asc 

30 2. grow to OD600 1 , harvest cells and make plasmid preparation 

3. digest 1 00|Lig pYAC4-Asc w.. BamHI and AscI 

4. dephosphorylate fragments and heat inactivate phosphatase( 20 min. 80 C) 

5. purify fragments(e.g. Qiaquick Gel Extraction Kit) 

6. run 1 % agarose gel to estimate amount of fragment 



35 
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Preparation of expression cassettes 

1 . take 100 iig of plasmid preparation from each of the following libraries 

a) pMA-CAR 

b) pCA-CAR 

5 c) Phaffia cDNA library 

d) Carrot cDNA library 

2. digest w. Srf1( 10 units/prep, 37C overnight) 

3. dephosphorylate (10 units/prep, 37C, 2h) 

4. heat inactivate 80C. 20 min 

10 5. concentrate and change buffer (precipitation or ultra filtration). 

6. digest w. Ascl. (10 units/prep, 37 C, overnight) 

7. adjust volume of preps to 1 00 |uiL 



15 



preparation of EVACs 

Different types of EVACs have been made by varying the ratio of the different li- 
braries that goes into the ligation reaction. 





pMA-CAR 


pCA-CAR 


Phaffia cDNA 


Carrot cDNA 


EVAC 










A 


40% 


40% 


10% 


10% 


B 


25% 


25% 


25% 


25% 



20 1 . add --1 00 ng arms of pYAC4-Asc /1 00 |Lig of cassette mixture 

2. concentrate to < 33.5 |liL 

3. add 2.5 units of T4 DNA-ligase + 4 |xL 1 0x ligase buffer. Adjust to 40 fiL 

4. Iigate3h. 16C 

5. stop reaction by adding 2 ^iL of 500 mM EDTA 

25 6. bring reaction volume to 125 jiL, add 25 [iL loading mix, heat at 60C for 5 

min 

7. distribute evenly in 10 wells of a 1 % LMP agarose gel 

8. run pulsed field gel (CHEF III, 1% LMP agarose, Vz strength TBE (BioRad), 
angle 120, temperature 12 C, voltage 5.6V/cm, switch time ramping 5 - 25 s, 

30 run time 30 h) 
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9. stain part of the gel that contains molecular weight markers + 1 sample lane 
for quality check 

10. cut remaining 9 sample lanes corresponding to mw. 97 - 194 kb(fraction 1); 
194 - 291 kb(fraction 2); 291-365 kb(fraction 3) from the gel 

5 11. agarase gel in high NaCI agarase buffer . 1 U agarase / lOO^ig gel. 40C 3 h 

12. concentrate preparation to < 20 |liL 

13. transform suitable yeast strain w. preparation using alkali/cation transforma- 
tion 

14. plate on selective minimal media plates 
10 15. incubate 30 C for 4-5 days 

16. pick colonies 

1 7. analyse colonies 

Example 2 

15 Preparation of EVACs (EVolvable Artificial Chromosomes) with direct trans- 
formation 

preparation of pYAC4-Asc arms 

1 . inoculate 1 50 ml of LB with a single colony of DH5a containing pYAC4-Asc 

2. grow to OD600 - 1 , harvest cells and make plasmid preparation 
20 3. digest lOO^ig pYAC4-Asc w. BamH1 and AscI 

4. dephosphorylate fragments and heat inactivate phosphatase( 20 min, 80 C) 

5. purify fragments(e.g. Qiaquick Gel Extraction Kit) 

1 . run 1 % agarose gel to estimate amount of fragment 



25 Preparation of expression cassettes 

1 . take 100 ing of plasmid preparation from each of the following libraries 

e) pMA-CAR 

f) pCA-CAR 

g) Phaffia cDNA library 
30 h) Carrot cDNA library 

2. digest w. Srf1( 10 units/prep, 37C overnight) 

3. dephosphorylate (10 units/prep, 37C. 2h) 

4. heat inactivate 80C, 20 min 

5. concentrate and change buffer (precipitation or ultra filtration). 
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6. digest w. Asc1. (10 units/prep, 37 C, overnight) 

7. adjust volume of preps to 1 00 |xL 



5 



preparation of EVACs 

Different types of EVACs have been made by varying the ratio of the different li- 
braries that goes into the ligation reaction. 





pMA-CAR 


pCA-CAR 


Phaffia cDNA 


Carrot cDNA 


EVAC 










A 


40% 


40% 


10% 


10% 


B 


25% 


25% 


25% 


25% 



1 . concentrate to < 32 |xL 

2. add 1 unit of T4 DNA-ligase + 4 jiL 1 0x ligase buffer. Adjust to 40 \iL 

3. Ilgate2h. 16C 

15 4. stop reaction by adding 2 |liL of 500 mM EDTA, heat inactivate 60C, 20 min 

5. bring reaction volume to 500 \iL with dH20. concentrate to 30 iiL 

6. add 1 0 U AscI . 4 |liL 1 0X AscI buffer, bring to 40 |liL 

7. incubate at 37C for 1h (alternatively 15 min 30 min) 

8. heat inactivate 60C. 20 min 

20 9. add 2 jig YAC4-Asc arms. 1 U T4 DNA ligase, 10 m.L 10X ligase buffer, bring 

to 100 \iL 

10. incubate ON, 16C 

1 1 . add water to 500 |uiL 

12. concentrate to 25 |llL 

25 13. transform suitable yeast strain w. preparation using alkali/cation transforma- 

tion or other suitable transformation method 
14. plate on selective minimal media plates 

5. incubate 30 C for 4-5 days 

6. pick colonies 

30 17. analyse colonies 
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Example 3 

Preparation of EVACs (EVolvable Artificial Chromosomes) (Small scale prepa- 
ration) 



Preparation of expression cassettes 

1. inoculate 5 ml of LB-medium (Sigma) with library inoculum corresponding to a 
10+ fold representation of library. Grow overnight 

2. make plasmid miniprep from 1 .5 ml of culture (E.g. Qiaprep spin miniprep kit) 
10 3. digest plasmid w. Srf 1 

4. dephosphorylate fragments and heat inactivate phosphatase( 20 min, 80 C) 

5. digest w. AscI 

6. run 1/10 of reaction in 1 % agarose to estimate amount of fragment 

15 preparation of pYAC4-Asc arms 

1 . inoculate 1 50 ml of LB with a single colony of E. coli DH5a containing pYAC4- 
Asc 

2. grow to OD600 - 1 , harvest cells and make plasmid preparation 

3. digest lOO^ig pYAC4-Asc w. BamHI and AscI 

20 4. dephosphorylate fragments and heat inactivate phosphatase( 20 min, 80 C) 

5. purify fragments(E.g. Qiaquick Gel Extraction Kit) 

6. run 1 % agarose gel to estimate amount of fragment 

preparation of EVACs 

25 1 . mix expression cassette fragments with YAC-arms so that cassette/arm ration is 
-1000/1 

2. if needed concentrate mixture(use e.g. Microcon YM30) so fragment concentra- 
tion > 75 ng/|iL reaction 

3. add 1 U T4 DNA ligase, incubate 16C, 1-3 h . Stop reaction by adding 1 ^iL of 
30 500 mM EDTA 

4. run pulsed field gel (CHEF III, 1% LMP agarose. !4 strength TBE. angle 120, 
temperature 12 C, voltage 5.6V/cm, switch time ramping 5 - 25 s, run time 30 h) 
Load sample in 2 lanes. 

5. stain part of the gel that contains molecular weight markers 
35 6. cut sample lanes corresponding to mw. 1 00 - 200 kb 
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7. agarase gel in high NaCI agarase buifer . 1 U agarase / 100 mg gel 

8. concentrate preparation to < 20 |iL 

9. transfornn suitable yeast strain w. preparation using electroporation 

10. plate on selective minimal media plates 
5 11. incubate 30 C for 4-5 days 

12. pick colonies 

Example 4: cDNA libraries used in the production of EVACs 

10 1 . Daucus carota, carrot root library: 

• Full length 

• Oligo dT primed, directional cDNA library 

• cDNA library made using a pool of 3 Evolva EVE 4, 5 & 8 vectors (Fig. 4. 5, 6) 

• Number of independent clones: 41 .6 x 1 0^ 
15 • Average size: 0.9 - 2.9 kb 

• Number of different genes present: 5000 -10000 

2. Xanthophyllomyces dendrorhous, (yeast), hole organism library 

• Full length 

20 • Oligo dT primed, directional cDNA library 

• cDNA library made using a pool of 3 Evolva EVE 4. 5 & 8 vectors (Fig. 4, 5, 6) 

• Number of independent clones: 48.0 x 10® 

• Average size: 1 .0 - 3.8 kb 

• Number of different genes present: 5000 -1 0000 

25 

3. Target carotenoid gene cDNA library 

• Full length and normalised 

• Directional cDNA cloning 

• Library made by cloning each gene individually in 2 Evolva EVE 4, 5 & 8 vectors 
30 (Fig. 4, 5,6) 

• Number of different genes: 48 . 

• Species and genes used: 

• Gentiana sp., ggps, psy, pds, zds. Icy-b. Icy-e, bhy, zep 

• Rhodobacter capsulatus, idi, crtC, crtF 
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• Erwinia uredovora, crtE, crtB, crti, crtY, crtZ 

• Nostoc anabaena, zds 

• Synechococcus PCC7942, pds 

• Erwinia herbicola, crtE, crtB, crtI, crtY, crtZ 
5 ♦ Staphylococcus aureus, crtM, crtN 

• Xanthophyllomyces dendrorhous, crtI, crtYb 

• Capsicum annuum, ccs, crtL 

• Nicotiana tabacum, crtL, bchy 

• Prochlorococcus sp,, Icy-b, Icy-e 
10 • Saccharomyces cerevisiae, idi 

• Corynebacterium sp., crtI, crtYe, crtYf, crtEb 

• Lycopersicon esculentum, psy-1 

• Neurospora crassa, all 



15 Example 5: Transformation of EVACs 
Example 5a: Transformation 

1. Inoculate a single colony into 100 ml YPD broth and grow with aeration at 30°C 
to mid log, 2 x 10^ to 2 x 10^ cells/ml. 

2. Spin to pellet cells at 400 x g for 5 minutes; discard supernatant. 

20 3. Resuspend cells in a total of 9 ml TE, pH 7.5. Spin to pellet cells and discard 
supernatant. 

4. Gently resuspend cells in 5 ml 0.1 M Lithium/Cesium Acetate solution, pH 7.5. 

5. Incubate at SO^'C for 1 hour with gentle shaking. 

6. Spin at 400 x g for 5 minutes to pellet cells and discard supernatant. 

25 7. Gently resuspend in 1 ml TE, pH 7.5- Cells are now ready for transformation. 

8. In a 1 .5 ml tube combine: 

• 100 (jl yeast cells 

• 5 Ml Carrier DNA (10 mg/ml) 

• 5 Ml Histamine Solution 

30 •1/5 of an EVAC preparation in a 10 pi volume (max). (One EVAC 

preparation is made of 100 iig of concatenation reaction mixture) 

9. Gently mix and incubate at room temperature for 30 minutes. 
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10. In a separate tube, combine 0.8 ml 50% (w/v) PEG 4000 and 0.1 ml TE and 0.1 
ml of 1 M LIAc for each transformation reaction. Add 1 ml of this PEG/TE/LiAc 
mix to each transformation reaction. Mix cells into solution with gentle pipetting. 

1 1 . Incubate at 30°C for 1 hour. 

5 12. Heat shock at 42**C for 1 5 minutes; cool to 30**C. 

13. Pellet cells in a microcentrifuge at high speed for 5 seconds and remove 
supernatant. 

14. Resuspend in 200 pi of rich media and plate in appropriate selective media 

1 5. Incubate at SO'^C for 48-72 hours until transformant colonies appear. 

10 

Example 5b: Transformation of EVACs using electroporation 

100 ml of YPD is inoculated with one yeast colony and grown to ODeoo = 1.3 to 1.5. 
The culture is harvested by centrifuging at 4000 x g and 4''C. The cells are 

15 resuspended in 16 ml sterile H2O. Add 2 ml 10 x TE buffer, pH 7.5 and swirl to mix. 
Add 2 mllO X lithium acetate solution (1 M, pH 7.5) and swirl to mix. Shake gently 
45 min at 30°C. Add 1.0 ml 0.5 M DTE while swirling. Shake gently 15 min at 30°C. 
The yeast suspension is diluted to 100 ml with sterile water. The cells are washed 
and concentrated by centrifuging at 4000 x g, resuspending the pellet in 50 ml ice- 

20 cold sterile water, centrifuging at 4000 x g, resuspending the pellet in 5 ml ice-cold 
sterile water, centrifuging at 4000 x g and resuspending the pellet in 0.1 ml ice-cold 
sterile 1 M sorbitol. The electroporation was done using a Bio-Rad Gene Falser. In a 
sterile 1.5-ml microcentrifuge tube 40 pi concentrated yeast cells were mixed with 5 
pi 1:10 diluted EVAC preparation. The yeast-DNA mix is transferred to an ice-cold 

25 0.2-cm-gap disposable electroporation cuvette and pulsed at 1.5 kV, 25 pF, 200 £2. 
1 ml ice-cold 1 M sorbitol is added to the cuvette to recover the yeast. Aliquots are 
spread on selective plates containing 1 M sorbitol. Incubate at 30''C until colonies 
appear. 

30 Exampie 6: Rare restriction enzymes with recognition sequence and cleavage 
points 

In this example, rare restriction enzymes are listed together with their recognition 
sequence and cleavage points. C^) indicates cleavage points 5-3' sequence and (_) 
indicates cleavage points in the complementary sequence. 



35 
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W = AorT; N=A, C, G. orT 



6a) 



Unique, palindromic overliang 



10 



15 



AscI 

AsiSI 

CciNI 

CspBI 

Fsel 

MchAI 

NotI 

Pad 

Sbfl 

Sdal 

Sgf I 

SgrAI 

Sse232I 

Sse8387I 



GG'^CGCG^CC 
GCG^AT'^CGC 
GC^GGCC.GC 
GC^GGCC_GC 
GG_CCGG^CC 
GC^GGCC_GC 
GC^GGCC.GC 
TTA_AT^TAA 
CC_TGCA"'GG 
CC_TGCA^GG 
GCG_AT'^CGC 
CR'^CCGG_YG 
CC^CCGG^CG 
CC_TGCA'^GG 



20 



6b) 



No overhang 



25 



30 



BstRZ246I 

BstSWI 

MspSWI 

MssI 

Pmel 

Smil 

Srf I 

Swal 



ATTT-^AAAT 
ATTT'^AAAT 
ATTT'^AAAT 
GTTT^^AAAC 
GTTT^AAAC 
ATTT-^AAAT 
GCCC-^GGGC 
ATTT^AAAT 



6c) 



Non-palindromic and/or variable overhang 



35 Aari CACCTGCisiisnsiN'^jsnsnsnsr^ 

Abel CC^TCA_GC 

Al oi ^NNimw^isnsiisiN]^^ 

Baei ^]sn^]NNN_]snsnsi^^ 

BbvCI CC^TCA_GC 

40 Cpol CG'^GWC^CG 

Cspl CG'^GWC_CG 

Pfl27I RG'^GWC_CY 

pp i I '^N]snsi]snsr_isiNisi]^^ 

PpuMI RG^GWC„CY 

45 PpuXI RG^GWC_CY 

Psp5II RG'^GWC^CY 

PspPPI RG'^GWC^CY 

RsrII CG'^GWC_CG 

Rsr2I CG'^GWC_CG 

50 SanDI GG^GWC_CC 

Sapl GCTCTTCN'^N1SIN_ 

Sdil GGCCN_lSnSIN^NGGCC 

SexAI A^CCWGG_T 

Sfil GGCCN_lSrNlSr'^NGGCC 

55 Ssel82 5I GG"GWC_CC 

Sse8647I AG'^GWC^CT 

VpaK3 2 1 GCTCTTCN^N1SIN__ 
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6d) 



Meganucleases 



10 



15 



I-Sce I 
I-Ceu I 
I-Cre I 
I-Sce II 
I-Sce III 
Endo . See 
Pl-Sce I 
PI-Psp I 
I-Ppo I 
HO 

I-Tev I 



TAGGGATAA^CAGC^GTAAT 
ACGGTC_CTAA^GGTAG 
AAACGTC_GTGA^GACAGTTT 
GGTC_ACCC '^TGAAGTA 
GtTTTGG_TAAC ^ TATTTAT 
GATGCTGC_AGGC ATAGGCTTGTTTA 
GG.GTGC^GGAGAA 

TGGCAAACAGCTA_TTAT -^GGGTATTATGGGT 

CTCTC_TTAA^GGTAG 

TTTCCGC^AACA-^GT 

NN NN'^NNTCAGTAGATGTTTTTCTTGGTCTACCGTTT 



More meganucleases have been identified, but tlieir precise sequence of recognition 
has not been determined, see e.g. www.meganuclease.com 



20 Example 7: Concatemer size limitation experiments (use of stoppers) 

Materials used: 

pYAC4 (Sigma. Burke et al. 1987, science, vol 236, p 806) was digested w. EcoRI 
and BamHI and dephosphorylated 
25 pSE420 (invitrogen) was linearised using EcoR1 and used as the model fragment 
for concatenation. 

T4 DNA ligase (Amersham-pharmacia biotech) was used for ligation according to 
manufacturers instructions. 

30 Method: Fragments and arms were mixed in the ratios(concentrations are arbitrary 
units) indicated on figures 9a and 9b. Ligation was allowed to proceed for 1 h at 
16C. Reaction was stopped by the addition of 1 |iL 500 mM EDTA. Products were 
analysed by standard agarose GE (1 % agarose, 14 strength TBE) or by 
PFGE(CHEF III, 1% LMP agarose, strength TBE, angle 120, temperature 12 C. 

35 voltage 5.6V/cm, switch time ramping 5 - 25 s, run time 30 h) 

The results are shown in Figure 9, wherein it is shown that the size of concatemers 
is proportional to the ratio of cassettes per YAC arms. 

40 Example 8: Integration of expression cassettes into artificial chromosomes 
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Integration of expression cassettes into YAC12 was done essentially as done by 
Sears D.D.. Hieter P., Simchen G., Genetics, 1994. 138. 1055-1065. 

An AscI site was introduced into the Bgl II site of the integration vectors pGS534 and 
5 pGS525. 

A p-galactosidase gene, as well as crtE. crtB, crti and crtY from Erwinia Uredovora 
were cloned into pEVE4. These expression cassettes were ligated into AscI of the 
modified integration vectors pGS534 and pGS525. 

10 

Linearised pGS534 and pGS525 containing the expression cassettes were 
transformed into haploid yeast strains containing the appropriate target YAC which 
carries the Ade" gene. Red Ade- transformants were selected (the parent host strain 
is red due to the ade2-101 mutation). 

15 

Additional confirmation of correct integration of the p-galactosidase expression cas- 
sette was done using a p-galactosidase assay. 

Example 9: Re-transformation of cells that already contain Artificial 
20 chromosomes to obtain at least 2 artificial chromosomes per cell 

Yeast strains containing YAC12, Sears D.D., Hieter P., Simchen G., Genetics, 1994, 
138 , 1055-1065 were transformed with EVACs following the protocol described in 
example 4a. The transformed cells were plated on plates that select for cells that 
25 contained both YAC12 and EVACs. 



30 Example 10: Example of different expression patterns ''phenotypes" obtained 
using the same yeast clones under different expression conditions: 

Colonies were picked with a sterile toothpick and streaked sequentially onto plates 
corresponding to the four repressed and/or induced conditions (-Ura/-Trp, -Ura/- 
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Trp/-Met, -Ura/-Trp/+200 |jM CU2SO4, -Ura/-Trp/-Met/+200 |jM CU2SO4). 20 mg 
adenin was added to the media to suppress the ochre phenotype. 



5 
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Claims 

L A library comprising a collection of individual cells, the cells being denoted 
5 celli, cell2, celli, wherein i > 2. 

each cell comprising at least one concatemer of individual oligonucleotide 
cassettes, each concatemer comprising a nucleotide sequence of the following 
formula: 

0 

[rs2-SP-PR-X-TR-SP-rsi]n 



wherein 



1 5 rsi and rs2 together denote a restriction site, 

SP denotes a spacer of at least two bases, 
X denotes an expressible nucleotide sequence, 
PR denotes a promoter, capable of regulating the expression of X in 
20 the cell, 

TR denotes a terminator, and 
n > 2, and 



wherein at least one concatemer of celh is different from a concatemer of celb. 

25 

2. The library according to claim 1 , wherein a concatemer of each cell comprises at 
least a first cassette and a second cassette, said first cassette being different 
from said second cassette. 

30 3. The library according to claim 1, wherein substantially all cassettes of a 
concatemer are different. 



4. The library according to claim 1 , wherein substantially all cells of the library are 
. different. 



wo 02/059297 PCT/DK02/00056 

68 

5. The library according to claim 1, said library comprising a collection of 
sublibraries. 

6. The library according to claim 5, wherein a sublibrary is a collection of individual 
5 cells having at least one phenotype in common. 

7. The library according to claim 6, wherein the at least one phenotype is selected 
from the group comprising the ability to grow on unusual substrates, the ability to 
grow on sublethal concentration of toxins, the ability to grow at a high 

10 temperature, the ability to grow at a low temperature, the ability to grow at 

elevated osmolality, the ability to grow at low osmolality, the ability to grow at 
high salinity, the ability to grow at low salinity, the ability to grow at elevated 
metal concentrations, the ability to grow at high C02 concentrations, the ability 
to grow at low C02 concentrations, the ability to grow at high 02 concentrations, 

1 5 the ability to grow at low 02 concentrations, the ability to provide special spectral 

properties, the ability to provide a special colour, the ability to have a deviating 
GST activity, the ability to have a deviating P450 activity. 

8. The library according to claim 6, wherein a sublibrary is a collection of individual 
20 cells, said cells having - for at least one identical expressible DNA sequence - 

different promoters. 

9. The library according to claim 6, wherein a sublibrary is a collection of individual 
cells, each cell having - in at least one cassette of the concatemer - identical 

25 expressible DNA sequences. 

10. The library according to claim 1, comprising at least 20 individual cells, such as 
at least 50 individual cells. 

30 11. The library according to claim 1, comprising at least 100 individual cells, such as 
at least 1,000 cells, for example at least 10,000 cells such as at least 100,000 
cells, for example at least 1 .000.000 cells, such as at least 1 ,000,000,000. 



35 



12. The library according to claim 1, comprising a collection of cells from one 
species. 
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13. The library according to claim 12, wherein said species is selected from 
prokaryotic species or mutants thereof. 

5 14. The library according to claim 13, wherein said prokaryotic species is selected 
from Escherichia coli, Bacillus subtilis, Streptomyces lividans, Streptomyces 
coelicolor Pseudomonas aeruginosa, Myxococcus xanthus. 

15. The library according to claim 14, wherein said wherein said prokaryotic species 
10 is selected from E. coli. 

16. The library according to claim 1, wherein said species is selected from 
eukaryotic species or mutants thereof. 

15 17. The library according to claim 16, wherein said wherein said eukaryotic species 
is selected from mammals, insects, vertebrates, plants, fungi, yeasts; 
filamentous ascomycetes such as Neurospora crassa and Aspergillus nidulans; 
plant cells such as those derived from Nicotiana and Arabidopsis; mammalian 
host cells such as those derived from humans, monkeys and rodents, such as 

20 Chinese hamster ovary (CHO) cells, NIH/3T3, COS, 293. VERO, HeLa. 

18. The library according to claim 16. wherein said wherein said eukaryotic species 
is selected from yeast or mutants thereof. 

25 19. The library according to claim 18, said yeast being selected from the group 
comprising budding yeast, Kluyveromyces marxianus, K. lactis. Candida utilis. 
Phaffia rhodozyma. Saccharomyces boulardii. Pichia pastoris, Hansenula 
polymorpha, Yarrowia lipolytica. Candida paraffinica. Schwanniomyces castellii, 
Pichia stipitis. Candida shehatae. Rhodotorula glutinis. Lipomyces lipofer. 

30 Cryptococcos curvatus, Candida spp. (e.g. C. palmioleophila), Yarrowia 

lipolytica. Candida guilliermpndii, Candida. Rhodotorula spp.. Saccharomycopsis 
spp., Aureobasidium pullulans, Candida brumptii. Candida hydrocarbofumarica. 
Torulopsis, Candida tropicalis, Saccharomyces cerevisiae, Rhodotorula rubra, 
Candida flaveri, Eremothecium ashbyii, Pichia spp., Kluyveromyces. Hansenula, 
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Kloeckera, Pichia, Pachysolen spp., or Torulopsis bombicola, or mutants 
thereof. 

20. The library according to any of the preceding claims, wherein substantially all 
5 rsi-rs2 sequences are recognised by the same restriction enzyme, more 

preferably wherein substantially all rsi-rs2 sequences are substantially identical. 

21 . The library according to any of the preceding claims, wherein n in at least one 
concatemer in at least one cell cell, more preferably wherein n in substantially all 

10 concatemers in substantially all cells is at least 10, such as at least 15, for 

example at least 20, such as at least 25, for example at least 30, such as from 
30 to 60 or more than 60, such as at least 75, for example at least 100. such as 
at least 200, for example at least 500, such as at least 750, for example at least 
1000, such as at least 1500, for example at least 2000. 

15 

22. The library according to any of the preceding claims, wherein at least one cell 
comprises, more preferably substantially all cells comprise 2 concatemers per 
cell, more preferably 3 per cell, such as 4 per cell. 

20 23. The library according to any of the preceding claims, wherein, at least one 
cassette in one cell comprises an intron between the promoter and the 
expressible nucleotide sequence, more .preferably substantially all cassettes in 
substantially all cells comprise an intron between the promoter and the 
expressible nucleotide sequence. 

25 

24. The library according to claim 1 , wherein the difference is a difference in the 
spacer sequence and/or the promoter, and/or the expressible nucleotide 
sequence and/or the intron and/or terminator sequence. 

30 25. The library according to claim 24, wherein the different expressible nucleotide 
sequences come from the same or from different expression states. 



35 



26. The library according to claim 25, wherein the different expression states 
represent at least two different tissues, such as at least two organs, such as at 
least two species, such as at least two genera. 
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27. The library according to claim 26, wherein the different species are from at least 
tWo different phylae, such as from at least two different classes, such as from at 
least two different divisions, more preferably from at least two different sub- 

5 kingdoms, such as from at least two different kingdoms. 

28. The library according to any of the preceding claims, wherein the nucleotide 
sequence of at least one concatemer, preferably the nucleotide sequence from 
substantially all concatemers have been designed to minimise the level of repeat 

10 sequences in any one concatemer. 

29. The library according to claim 28, wherein recombination within the expressible 
nucleotide sequences has been minimised. 

15 30. The library according to any of the preceding claims, wherein at least one 
concatemer is ligated into a plasmid or into an integration vector, such as a 
plasmid vector, a phage vector, a viral vector or a cosmid vector. 

31. The library according to claim 30, wherein at least one concatemer is integrated 
20 into the host genome. 

32. The library according to claim 30, wherein at least one concatemer is integrated 
into an artificial chromosome in the host cell. 

25 33. The library according to any of the preceding claims, wherein the restriction site 
comprises a rare restriction site, having at least 7 bases in the recognition 
sequence, more preferably at least 8 bases, such as at least 9 bases, for 
example at least 1 0 bases. 

30 34. The library according to any of the preceding claims, wherein the restriction 
enzyme recognising the rsi rs2 restriction site produces sticky ends upon 
cleavage of a double stranded nucleotide sequence, preferably wherein the 
sticky ends have a pre-determined nucleotide sequence. 
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35. The library according to any of the preceding claims, wherein the spacer 
sequences together comprise at least 50 bases, such as at least 60 bases, for 
example at least 75 bases, such as at least 100 bases, for exarnple at least 150 
bases, such as at least 200 bases, for example at least 250 bases, such as at 

5 least 300 bases, for example at least 400 bases, such as at least 500 bases, 

such as at least 750 bases, for example at least 1 000 bases, such as at least 
11.00 bases, for example at least 1200 bases, such as at least 1300 bases, for 
example at least 1400 bases, such as at least 1500 bases, for example at least 
1600 bases, such as at least 1700 bases, for example at least 1800 bases, such 

10 as at least 1900 bases, for example at least 2000 bases, such as at least 2100 

bases, for example at least 2200 bases, such as at least 2300 bases, for 
example at least 2400 bases, such as at least 2500 bases, for example at least 
2600 bases, such as at least 2700 bases, for example at least 2800 bases, such 
as at least 2900 bases, for example at least 3000 bases, such as at least 3200 

15 bases, for example at least 3500 bases, such as at least 3800 bases, for 

example at least 4000 bases, such as at least 4500 bases, for example at least 
5000 bases, such as at least 6000 bases. 

36. The library according to claim 35, wherein at least one of the spacer sequences 
20 comprises between 50 and 2500 bases, preferably between 100 and 2500 

bases, preferably between 200 and 2300 bases, more preferably between 300 
and 2100 bases, such as between 400 and 1.900 bases, more preferably 
between 500 and 1700 bases, such as between 600 and 1500 bases, more 
preferably between 700 and 1400 bases. 

25 

37. A library comprising a collection of individual cells, the cells being denoted 

celli. celb, celli. wherein i > 2. 

30 each cell comprising at least two expression cassettes comprising a nucleotide 

sequence of the following formula: 

[rs2-SP-PR-X-TR-SP-rsi] 



35 



wherein 
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rsi and rs2 together denote a restriction site, 

SP denotes a spacer of at least two bases, 
5 PR denotes a promoter, capable of functioning in the cell, 

X denotes an expressible nucleotide sequence, 
TR denotes a terminator, and 

wherein at least one of the expression cassettes comprises an expressible 
10 nucleotide sequence heterologous to the cell, and at least one of the cassettes 

of celh is different from the cassettes of celb. 

38. The library according to claim 37, wherein substantially all cells of the library are 
different. 

15 

39. The library according to claim 37. said library comprising a collection of 
sublibraries. 

40. The library according to claim 39. wherein a sublibrary is a collection of 
20 individual cells having at least one phenotype in common. 

41 . The library according to claim 40. wherein the at least one phenotype is selected 
from the group comprising the ability to grow on unusual substrates, the ability to 
grow on sublethal concentrations of toxins, the ability to grow at a high 

25 temperature, the ability to grow at a low temperature, the ability to grow at 

elevated osmolality, the ability to grow at low osmolality, the ability to grow at 
high salinity, the ability to grow at low salinity, the ability to grow at elevated 
metal concentrations, the ability to grow at high CO2 concentrations, the ability to 
grow at low CO2 concentrations, the ability to grow at high O2 concentrations, the 

30 ability to grow at low O2 concentrations, the ability to provide special spectral 

properties, the ability to provide a special colour, the ability to have a deviating 
. GST activity, the ability to have a deviating P450 activity. 
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42. The library according to claim 37, wherein a sublibrary is a collection of 
individual cells, said cells having - for identical expressible DNA sequences - 
different promoters. 

5 43. The library according to claim 42, wherein a sublibrary is a collection of 
individual cells, each cell having at least one cassette with identical expressible 
DNA sequiBnces. 

44. The library according to claim 37, comprising at least 20 individual cells. 

10 

45. The library according to claim 37, comprising at least 50 individual cells. 

46. The library according to claim 37, comprising at least 100 individual cells, such 
as at least 1,000 cells, for example at least 10,000 cells such as at least 100,000 

1 5 cells, for example at least 1 ,000,000 cells, such as at least 1 ,000,000,000. 

47. The library according to claim 37, comprising a collection of cells from one 
species. 

20 48. The library according to claim 47, wherein said species is selected from 
prokaryotic species or mutants thereof. 

49. The library according to claim 48, wherein said prokaryotic species is selected 
from Escherichia coli. Bacillus subtilis, Streptomyces lividans, Streptomyces 

25 coelicolor Pseudomonas aeruginosa, Myxococcus xanthus. 

50. The library according to claim 49, wherein said wherein said prokaryotic species 
is selected from E. coli. 

30 51 . The library according to claim 47, wherein said species is selected from 
eukai^otic species or mutants thereof. 

52. The library according to claim 51 , wherein said wherein said eukaryotic species 
is selected from mammals, insects, vertebrates, plants, fungi, yeasts; 
35 filamentous ascomycetes such as Neurospora crassa and Aspergillus nidulans; 
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plant cells such as those derived from Nicotiana and Arabidopsis; mammalian 
host cells such as those derived from humans, monkeys and rodents, such as 
Chinese hamster ovary (CHO) cells, NIH/3T3, COS, 293, VERO, HeLa. 

5 53. The library according to claim 51, wherein said wherein said eukaryotic species 
is selected from fungi or mutants thereof. 

54. The library according to claim 53, said yeast being selected from the group 
comprising budding yeast, Kluyveromyces marxianus. K. lactis, Candida utilis, 

10 Phaffia rhodozyma, Saccharomyces boulardii. Pichia pastoris, Hansenula 

polymorpha, Yarrowia lipolytica, Candida paraffintca, Schwanniomyces castellii, 
Pichia stipitis, Candida shehatae, Rhodotorula glutinis, Lipomyces lipofer, 
Cryptococcos curvatus, Candida spp. (e.g. C. palmioleophiia), Yarrowia 
lipolytica, Candida guilliermondii, Candida, Rhodotorula spp., Saccharomycopsis 

15 spp., Aureobasidium pullulans, Candida brumptii, Candida hydrocarbofumarica, 

Torulopsis. Candida tropicalis, Saccharomyces cerevisiae, Rhodotorula rubra, 
Candida flaveri, Eremothecium ashbyii, Pichia spp., Kluyveromyces, Hansenula, 
Kloeckera, Pichia, Pachysolen spp., or Torulopsis bombicola, or mutants 
thereof. 

20 

55. The library according to any of the preceding claims 37 to 54, wherein 
substantially all rsi-rs2 sequences are recognised by the same restriction 
enzyme, more preferably wherein substantially all rsi-rs2 sequences are 
substantially identical. 

25 

56. The library according to any of the preceding claims 37 to 55, wherein at least 
one cell comprises, more preferably wherein substantially all cells comprise at 
least 10 cassettes, such as at least 15, for example at least 20, such as at least 
25, for example at least 30. such as from 30 to 60 or more than 60, such as at 

30 least 75, for example at least 100, such as at least 200, for example at least 500. 

such as at least 750, for example at least 1 000, such as at least 1 500, for 
example at least 2000. such as at least 2500, for example at least 5000, such as 
at least 7500, for example at least 10,000. 
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57. The library according to any of tlie preceding claims 37 to 56, wherein at least 
one cassette in one cell comprises an intron capable of being identified and 
treated as an intron in the host cell between the promoter and the expressible 
nucleotide sequence, more preferably substantially all cassettes in substantially 

5 all cells comprise an intron between the promoter and the expressible nucleotide 

sequence. 

58. The library according to claim 37, wherein the difference is a difference in the 
spacer sequence and/or the promoter, and/or the expressible nucleotide 

1 0 sequence and/or the intron and/or terminator sequence. 

59. The library according to claim 58, wherein the different expressible nucleotide 
sequences come from the same or from different expression states. 



15 60. The library according to claim 59, wherein the different expression states 
represent at least two different tissues, such as at least two organs, such as at 
least two species, such as at least two genera. 

61 . The library according to claim 60, wherein the different species are from at least 
20 two different phylae, such as from at least two different classes, such as from at 

least two different divisions, more preferably from at least two different sub- 
kingdoms, such as from at least two different kingdoms. 



62. The library according to any of the preceding claims 37 to 61, wherein the 
25 nucleotide sequence of at least one cassette, preferably the nucleotide 

sequence from substantially all cassettes have been designed to minimise the 
level of repeat sequences in any one cassettes. 

63. The library according to any of the preceding claims 37 to 62, wherein 
30 recombination within the expressible nucleotide sequences has been minimised. 

64. The library according to any of the preceding claims 37 to 63, wherein at least 
one cassette is ligated into a plasmid or into an integration vector, such as a 
plasmid vector, a phage vector, a viral vector or a cosmid vector. 



35 
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65. The library according to any of the preceding claims 37 to 64, wherein at least 
one cassette is integrated into the host genome. 

66. The library according to any of the preceding claims 37 to 65, wherein at least 
5 one cassette is integrated into an artificial chromosome in the host cell. 

67. A library comprising a collection of individual cells, the cells being denoted 
celli, celb, cellj, wherein i > 2, 



10 



15 



25 



30 



each cell comprising a random combination of heterologous oligonucleotides 
having the general formula: 



[PR-X] 



wherein 



X denotes an expressible nucleotide sequence, and 

PR denotes an independently controllable promoter being operably associated 
20 with X. 

i58. The library according to claim 67, wherein the random combinations are made 
from a two dimensional array of promoters and heterologous expressible 
nucleotide sequences. 



69. The library according to any of the preceding claims 67 to 68, wherein each cell 
comprises an individual selection of combinations of promoters and 
heterologous expressible nucleotide sequences drawn individually from the 
same pool of promoters and heterologous expressible nucleotide sequences. 

70. The library according to any of the preceding claims 67 to 69, wherein the library 
comprises at least 2 different independently controllable promoters, such as at 
least 3, for example at least 4. such as at least 5, for example at least 6, such as 
at least 7. for example at least 8, such as at least 9, for example at least 10, 
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such as at least 1 5, for example at least 25, such as at least 50, for exarnple at 
least 75, such as at least 1 00. 

71. The library according to any of the preceding claims 67 to 70, comprising an 
5 externally controllable promoter. 

72. The library according to claim 71, comprising an inducible and/or a repressible 
promoter. 

10 73. The library according to claim 71. comprising at least one promoter comprising 
both repressible and inducible elements. 

74. The library according to any of the preceding claims 67 to 73, comprising at least . 
one promoter being chemically inducible and/or repressible and/or 

15 inducible/repressible by temperature, and/or inducible/repressible according to 

mating type and/or inducible/repressible according to physical factors and/or 
inducible repressible according to environmental factors. 

75. The library according to any of the preceding claims 67 to 74, wherein at least 
20 one promoter is induced by any factor selected from the group comprising 

carbohydrates; galactose; low inorganic phosphase levels; temperature; 
gaseous environment; pressure; pH; low or high temperature shift; metals, or 
metal ions; copper ions; hormones; dihydrotestosterone; deoxycorticosterone; 
heat shock (e.g. 39**C); methanol; redox-status; growth stage; developmental 
25 stage; induced in MATa cells; synthetic inducers; gal inducer. 

76. The library according to any of the preceding claims 67 to 75, wherein at least 
one promoter is repressed by any factor selected from the group comprising 
carbohydrates; galactose; low inorganic phosphase levels; temperature; low or 

30 high temperature shift; metals or metal ions; copper ions; hormones; 

dihydrotestosterone; deoxycorticosterone; heat shock (e.g. 39**C); methanol; 
redox-status; gaseous environment; pressure; pH; growth stage; developmental 
stage; induced in MATa cells; synthetic inducers; gal inducer; high inorganic 
phosphate levels; methionine; glycerol; repressed in MATa or a/a cells 



35 
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77. The library according to any of the preceding claims 67 to 76, comprising at least 
one promoter selected from the group comprising ADH 1, PGK 1, GAP 491, TPI, 
PYK, ENO, PMA 1. PHQ5, GAL 1, GAL 2. GAL 10, MET25, ADH2. MEL 1. CUP 
1, HSE, MFa 1/Mfa f. AOX, MOX, SV40, CaMV, Opaque-2, GRE, ARE, 

5 PGK/ARE hybrid, CYC/GRE hybrid, TPI/a2 operator, AOX 1 , MOX A. 

78. The library according to claim 67, wherein at least one promoter is a synthetic 
promoter. 

79. The library according to any of the preceding claims 67 to 78, wherein at least 
one heterologous expressible nucleotide sequence is found in at least 2 cells, 
such as at least 3 cells, for example at least 5 cells, such as at least 10 cells, for 
example at least 25 cell, such as at least 50 cells, for example at least 100 cells, 
such as at least 500 cells, for example at least 1000 cells. 

80. The library according to any of the preceding claims 67 to 79, wherein at least 
one cell comprises a group of heterologous expressible nucleotide sequences 
under the control of a first promoter, the group comprising at least 5 
heterologous expressible nucleotide sequences, such as at least 10 
heterologous expressible nucleotide sequences, for example at least 15 
heterologous expressible nucleotide sequences, such as at least 25 
heterologous -expressible nucleotide sequences, for example at least 50 
heterologous expressible nucleotide sequences, such as at least 75 
heterologous expressible nucleotide sequences, for example at least 100 
heterologous expressible nucleotide sequences, such as at least 250 
heterologous expressible nucleotide sequences, for example at least 500 
heterologous expressible nucleotide sequences. 

81 . The library according to claim 80, wherein a cell comprises at least a second 
30 group of heterologous expressible nucleotide sequences under the independent 

control of second promoter, such as at least a third group of heterologous 
expressible nucleotide sequences under the independent control of a third 
promoter, for example at least a fourth group of heterologous expressible 
nucleotide sequences under the independent control of a fourth promoter, such 
35 as at least a fifth group of heterologous expressible nucleotide sequences under 
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the independent control of a fifth promoter, for example at least a sixth group of 
heterologous expressible nucleotide sequences under the independent control 
of a sixth promoter, such as at least a seventh group of heterologous 
expressible nucleotide sequences under the independent control of a seventh 
5 promoter, such as at least a eighth group of heterologous expressible nucleotide 

sequences under the independent control of a eighth promoter, for example at 
least a ninth group of heterologous expressible nucleotide sequences under the 
independent control of a ninth promoter, such as at least a tenth group of 
heterologous expressible nucleotide sequences under the independent control 
10 of a tenth promoter. 

82. The library according to any of the preceding claims 67 to 81, wherein the 
different expressible nucleotide sequences come from the same or from different 
expression states. 



15 



83. The library according to claim 82, wherein the different expression states 
represent at least two different tissues, such as at least two organs, such as at 
least two species, such as at least two genera. 



20 84. The library according to claim 83, wherein the different species are from at least 
two different phylae, such as from at least two different classes, such as from at 
least two different divisions, more preferably from at least two different sub- 
kingdoms, such as from at least two different kingdoms. 

25 85. The library according to any of the preceding claims 67 to 84, wherein the 
nucleotide sequence of at least one cassette, preferably the nucleotide 
sequence from substantially all cassettes have been designed to minimise the 
level of repeat sequences in any one cassettes. 

30 86. The library according to claim 85, wherein recombination within the expressible 
nucleotide sequences has been minimised. 



35 



87. The library according to any of the preceding claims 67 to 86. wherein at least 
one cassette is ligated into a plasmid or into an integration vector, such as a 
plasmid vector, a phage vector, a viral vector or a cosmid vector. 
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88. The library according to any of tlie preceding claims 67 to 87, wherein at least 
one cassette is integrated into the host genome. 

5 89. The library according to any of the preceding claims 67 to 88, wherein at least 
one cassette is integrated into an artificial chromosome in the host cell. 

90. The library according to any of the preceding claims 67 to 89, said library 
comprising a collection of sub-libraries. 

10 

91. The library according to claim 90, wherein a sub-library is a collection of 
individual cells having at least one phenotype in common. 

92. The library according to claim 91 , wherein the at least one phenotype is selected 
15 from the group comprising the ability to grow on unusual substrates, the ability to 

grow on sublethal concentrations of toxins, the ability to grow at a high 
temperature, the ability to grow at a low temperature, the ability to grow at 
elevated osmolality, the ability to grow at low osmolality, the ability to grow at 
high salinity, the ability to grow at low salinity, the ability to grow at elevated 
20 metal concentrations, the ability to grow at high C02 concentrations, the ability 

to grow at low C02 concentrations, the ability to grow at high 02 concentrations, 
the ability to grow at low 02 concentrations, the ability to provide special spectral 
properties, the ability to provide a special colour, the ability to have a deviating 
GST activity, the ability to have a deviating P450 activity. 

25 

93. The library according to claim 90, wherein a sublibrary is a collection of 
individual cells, said cells having - for at least one identical expressible DNA 
sequence, more preferably for substantially all identical expressible nucleotide 
sequences - different promoters. 



94. The library according to claim 90, wherein a sublibrary is a collection of 
individual cells, each cell having - in at least one cassette of the concatemer - 
identical expressible DNA sequences. 
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95. The library according to claim 67, comprising at least 20 individual cells, such as 
at least 50 individual cells. 

96. The library according to claim 67, comprising at least 100 individual cells, such 
5 as at least 1 ,000 cells, for example at least 10,000 cells such as at least 100,000 

cells, for example at least 1,000,000 cells, such as at least 1,000,000,000 cells. 

97. The library according to claim 67, comprising a collection of cells from one 
species. 

10 

98. The library according to claim 97, wherein said species is selected from 
prokaryotic species or mutants thereof. 

99. The library according to claim 98, wherein said prokaryotic species is selected 
15 from Escherichia coll, Bacillus subtilis, Streptomyces lividans, Streptomyces 

coelicolor Pseudonionas aeruginosa, Myxococcus xanthus. 

100. The library according to claim 97, wherein said species is selected 
from eukaryotic species or mutants thereof. 

20 

101. The library according to claim 100, wherein said wherein said 
eukaryotic species is selected from mammals, insects, vertebrates, plants, fungi, 
yeasts; filamentous ascomycetes such as Neurospora crassa and Aspergillus 
nidulans; plant cells such as those derived from Nicotiana and Arabidopsis; 

25 mammalian host cells such as those derived from humans, monkeys and 

rodents, such as Chinese hamster ovary (CHO) cells. NIH/3T3, COS, 293, 
VERO. HeLa. 

102. The library according to claim 100, wherein said wherein said 
30 eukaryotic species is selected from fungi or mutants thereof. 

103. The library according to claim 102, said yeast being selected from the 
group comprising budding yeast, Kluyveromyces marxianus, K. lactis, Candida 
utilis. Phaffia rhodozyma, Saccharomyces boulardii, Pichia pastoris, Hansenula 

35 polymorpha, Yarrowia lipolytica, Candida paraffinica, Schwanniomyces castellii, 
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Pichia stipitis, Candida shehatae, Rhodotorula glutinis, Lipomyces lipofer, 
Cryptococcos curvatus, Candida spp. (e.g. C. palmioleophila), Yarrowia 
lipolytica, Candida guilliermondii, Candida, Rhodotorula spp.. Saccharomycopsis 
spp., Aureobasidium pullulans, Candida brumptii, Candida hydrocarbofumarica, 
5 Torulopsis, Candida tropicalis, Saccliaromyces cerevisiae, Riiodotoruia rubra, 

Candida flaveri, Eremothecium ashbyii, Picfiia spp., Kluyveromyces, Hansenula, 
Kloecl<era, Pichia, Pachysolen spp., or Torulopsis bombicola. or mutants 
thereof. 

10 104. A library comprising at least one library or at least one sub-library as 

defined in any of claims 1 to 103. 



105. A method of producing a library comprising a collection of individual 

cells, comprising the steps: 
15 i) providing a population of nucleotide cassettes having the general 

formula [rs2-SP-PR-X-TR-SP-rsil, wherein rsi and rs2 together denote 
a restriction site, SP denotes a spacer of at least two bases, X 
denotes an expressible nucleotide sequence, PR denotes a 
promoter, capable of regulating the expression of X in the cell, TR 
20 denotes a terminator, and 

ii) assembling random sub-sets of the cassettes into concatemers 
comprising at least-two cassettes, 

iii) ligating the concatemers into vectors, 

iv) introducing vectors into host cells, 

25 v) mixing at least two cells so that at least one concatemer of a first cell 

comprises a random sub-set of cassettes being different from a 
random sub-set of cassettes of a concatemer of a second cell. 



106. The method according to claim 105, whereby the vectors comprise a 
30 plasmid or an integration vector, such as a plasmid vector, a phage vector, a 

viral vector or a cosmid vector. 

107. The method according to claim 106, whereby the vectors comprise an 
artificial chromosome. 



35 
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108. A method of producing a library comprising a collection of individual 

cells, comprising the steps: 

i) inserting at least two expressible nucleotides into the cloning site of 
at least two primary vectors comprising a cassette, the cassette 

5 comprising a nucleotide sequence of the general formula in 5'->3' 

direction: [RS1-RS2-SP-PR-CS-TR-SP-RS2-RSr] wherein RS1 and 
RSV denote restriction sites, RS2 denote a restriction site different 
from RSI and RSV, SP denotes a spacer sequence of at least two 
nucleotides, PR denotes a promoter, CS denotes a cloning site, and 
10 TR denotes a terminator. 

ii) excising the cassettes using at least a restriction enzyme specific for 
RSI, RSV RS2 and RS2' obtaining expression cassettes having the 
general formula [rs2-SP-PR-X-TR-SP-rsi], wherein rsi and rs2 
together denote a restriction site, and wherein X denotes an 

1 5 expressible nucleotide sequence, 

iii) inserting the expression cassettes into a vector, 

iv) transferring the expression cassettes into at least two host cells, and 

v) mixing at least two host cells having different cassettes. 

20 109. A method of producing a library comprising a collection of individual 

cells, comprising the steps: 

i) providing at least one expressible nucleotide sequence, 

ii) ligating at least one expressible nucleotide sequence to a controllable 
promoter capable of functioning in a host cell obtaining a first 

25 expression construct, 

iii) ligating at least one expressible nucleotide sequence to another 
independently controllable promoter capable of functioning in a host 
cell, obtaining a second expression construct, 

iv) inserting constructs of step ii) and iii) into at least two host cells, 

30 v) mixing at least two cells having a different combination of 

independently controllable promoter and expressible nucleotide 
sequences. 



35 



110. The method according to claim 109, wherein expression constructs are 

concatenated prior to insertion into host cells. 
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111. An expression library obtainable by the method of any of the claims 

105 to 110. 
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Fig. 2a 
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Fig. 6 



EVES entry vector 



Stfl 



AmpR__ 




ColEI 



wo 02/059297 



PCT/DK02/00056 



8/13 

Fig. 7 

pYAC4-AscI 

Vector for providing EVACS arms 



CEN4 Asc I 




BarrH I HISS 



wo 02/059297 



PCT/DK02/00056 



9/13 

Fig. 8 




wo 02/059297 PCT/DK02/00056 

10/13 

Fig. 9a 




wo 02/059297 



PCT/DK02/00056 



11/13 

Fig. 9b 




wo 02/059297 



PCT/DK02/00056 



12/13 

Fig. 10 




wo 02/059297 



PCT/DK02/00056 




wo 02/059297 PCT/DK02/00056 

1/16 

SEQUENCE LISTING 

<110> Evolva Biotech AS 
Goldsmith, Neil 
Sorensen, Alexandra M. P. 
Nielsen, Soren V.S. 

<120> A library of a collection of cells 

<130> P 501 PCOO 

<150> DK PA 2001 00128 
<151> 2001-01-25 

<150> DK PA 2001 00679 
<151> 2001-05-01 

<150> US 60/300,863 
<151> 2001-06-27 

<160> 4 

<170> Patentin version 3.1 

<210> 1 

<211> 3417 

<212> DNA 

<213> Synthetic 

<220> 

<221> misc_f eature 

<222> (1902) . , (2759) 

<223> Ampicillin resistance gene 



<220> 

<221> rep_origin 

<222> (959) . . (1899) 

<223> ColEl 



<220> 

<221> misc^feature 

<222> (2891) . . (3347) 

<223> fl-phage origin of replication 



<220> 

<221> terminator 

<222> (495) . . (823) 

<223> ADHl 



<220> 

<221> promoter 

<222> (49) . . (437) 

<223> Met25 promoter 
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<400> 1 
ctgatttgcc 


cgggcagttc 


aggctcatca 


ggcgcgccat 


gcagggattc 


ttcggatgca 


60 


a999ttcgaa 


tcccttagct 


ctcattattt 


tttgcttttt 


ctcttgaggt 


cacatgatcg 


120 


caaaatggca 


aatggcacgt 


gaagctgtcg 


atattgggga 


actgtggtgg 


ttggcaaatg 


180 


actaattaag 


ttagtcaagg 


cgccatcctc 


atgaaaactg 


tgtaacataa 


taaccgaagt 


240 


gtcgaaaagg 


tggcaccttg 


tccaattgaa 


cacgctcgat 


gaaaaaaata 


agatatatat 


300 


aaggttaagt 


aaagcgtctg 


ttagaaagga 


agtttttcct 


ttttcttgct 


ctcttgtctt 


360 


ttcatctact 


atttccttcg 


tgtaatacag 


ggtcgtcaga 


tacatagata 


caattctatt 


420 


acccccatcc 


atacaagctt 


ggcgccgaat 


tcgtcgaccc 


g999a^tccgc 


ggccgcaggc 


480 


ctaaattgat 


ctagagcttt 


ggacttcttc 


gccagaggtt 


tggtcaagtc 


tccaatcaag 


540 


gttgtcggct 


tgtctacctt 


gccagaaatt 


tacgaaaaga 


tggaaaaggg 


tcaaatcgtt 


600 


ggtagatacg 


ttgttgacac 


ttctaaataa 


gcgaatttct 


tatgatttat 


gatttttatt 


660 


attaaataag 


ttataaaaaa 


aataagtgta 


tacaaatttt 


aaagtgactc 


ttaggtttta 


720 


aaacgaaaat 


tcttgttctt 


gagtaactct 


ttcctgtagg 


tcaggttgct 


ttctcaggta 


780 


tagcatgagg 


tcgctcttat 


tgaccacacc 


tctaccggca 


tgcccatggg 


ttaactgatc 


840 


aatgcatcct 


gcatggcgcg 


cctgatgagc 


ctgaactgcc 


cgggcaaatc 


agctggacgt 


900 


ctgcctgcat 


taatgaatcg 


gccaacgcgc 


ggggagaggc 


ggtttgcgta 


ttgggcgctc 


960 


ttccgcttcc 


tcgctcactg 


actcgctgcg 


ctcggtcgtt 


cggctgcggc 


gagcggtatc 


1020 


agctcactca 


aaggcggtaa 


tacggttatc 


cacagaatca 


ggggataacg 


caggaaagaa 


1080 


catgtgagca 


aaaggccagc 


aaaaggccag 


gaaccgtaaa 


aaggccgcgt 


tgctggcgtt 


1140 


tttccatagg 


ctccgccccc 


ctgacgagca 


tcacaaaaat 


cgacgctcaa 


gtcagaggtg 


1200 


gcgaaacccg 


acaggactat 


aaagatacca 


ggcgtttccc 


cctggaagct 


ccctcgtgcg 


1260 


ctctcctgtt 


ccgaccctgc 


cgcttaccgg 


atacctgtcc 


gcctttctcc 


cttcgggaag 


1320 


cgtggcgctt 


tctcatagct 


cacgctgtag 


gtatctcagt 


tcggtgtagg 


tcgttcgctc 


1380 


caagctgggc 


tgtgtgcacg 


aaccccccgt 


tcagcccgac 


cgctgcgcct 


tatccggtaa 


1440 


ctatcgtctt 


gagtccaacc 


cggtaagaca 


cgacttatcg 


ccactggcag 


cagccactgg 


1500 


taacaggatt 


agcagagcga 


ggtatgtagg 


cggtgctaca 


gagttcttga 


agtggtggcc 


1560 


taactacggc 


tacactagaa 


ggacagtatt 


tggtatctgc 


gctctgctga 


agccagttac 


1620 


cttcggaaaa 


aoaottcrcrta 


gctcttgatc 


cggcaaacaa 


accaccgct^g 


QtaacqQtgq 


1680 


tttttttgtt 


tgcaagcagc 


agattacgcg 


cagaaaaaaa 


ggatctcaag 


aagatccttt 


1740 


gatcttttct 


acggggtctg 


acgctcagtg 


gaacgaaaac 


tcacgttaag 


ggattttggt 


1800 


catgagatta 


tcaaaaagga 


tcttcaccta 


gatcctttta 


aattaaaaat 


gaagttttaa 


1860 
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atcaatctaa agtatatatg agtaaacttg gtctgacagt taccaatgct taatcagtga 1920 

ggcacctatc tcagcgatct gtctatttcg ttcatccata gttgcctgac tccccgtcgt 1980 

gtagataact acgatacggg agggcttacc atctggcccc agtgctgcaa tgataccgcg 2040 

agacccacgc tcaccggctc cagatttatc agcaataaac cagccagccg gaagggccga 2100 

gcgcagaagt ggtcctgcaa ctttatccgc ctccatccag tctattaatt gttgccggga 2160 

agctagagta agtagttcgc cagttaatag tttgcgcaac gttgttgcca ttgctacagg 2220 

catcgtggtg tcacgctcgt cgtttggtat ggcttcattc agctccggtt cccaacgatc 22 80 

aaggcgagtt acatgatccc ccatgttgtg caaaaaagcg gttagctcct tcggtcctcc 2340 

gatcgttgtc agaagtaagt tggccgcagt gttatcactc atggttatgg cagcactgca 2400 

taattctctt actgtcatgc catccgtaag atgcttttct gtgactggtg agtactcaac 2460 

caagtcattc tgagaatagt gtatgcggcg accgagttgc tcttgcccgg cgtcaatacg 2520 

ggataatacc gcgccacata gcagaacttt aaaagtgctc atcattggaa aacgttcttc 2580 

ggggcgaaaa ctctcaagga tcttaccgct gttgagatcc agttcgatgt aacccactcg 2 64 0 

tgcacccaac tgatcttcag catcttttac tttcaccagc gtttctgggt gagcaaaaac 2700 

aggaaggcaa aatgccgcaa aaaagggaat aagggcgaca cggaaatgtt gaatactcat 2 760 

actcttcctt tttcaatatt attgaagcat ttatcagggt tattgtctca tgagcggata 2 820 

catatttgaa tgtatttaga aaaataaaca aataggggtt ccgcgcacat ttccccgaaa 2880 

agtgccacct gacgcgccct gtagcggcgc attaagcgcg gcgggtgtgg tggttacgcg 2 94 0 

cagcgtgacc gctacacttg ccagcgccct agcgcccgct cctttcgctt tcttcccttc 3000 

ctttctcgcc acgttcgccg gctttccccg tcaagctcta aatcgggggc tccctttagg 3060 

gttccgattt agtgctttac ggcacctcga ccccaaaaaa cttgattagg gtgatggttc 312 0 

acgtagtggg ccatcgccct gatagacggt ttttcgccct ttgacgttgg agtccacgtt 3180 

ctttaatagt ggactcttgt tccaaactgg aacaacactc aaccctatct cggtctattc 324 0 

ttttgattta taagggattt tgccgatttc ggcctattgg ttaaaaaatg agctgattta 33 00 

acaaaaattt aacgcgaatt ttaacaaaat attaacgctt acaatttcca ttcgccattc 3360 

aggctgcgca actgttggga agggcgatcg gtgcgggcct cttcgctatt acgccag 3417 

<210> 2 
<211> 3501 
<212> DNA 
<213> Synthetic 

<220> 

<221> misc_feature 

<222> (1986) . . (2843) 

<223> Ampicillin resistance gene 
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<221> rep_origin 
<222> (1043) . . (1983) 
<223> ColEl 



<220> 

<22l> misc_f eature 

<222> (2975) . . (3431) 

<223> fl-phage origin of replication 



<220> 

<221> terminator 

<222> (579) . . (907) 

<223> ADHl 



<220> 

<221> promoter 

<222> (49) . . (519) 

<223> Cupl promoter 



<400> 2 
ctgatttgcc 


cgggcagttc 


aggctcatca 


ggcgcgccat 


gcagggataa 


gccgatccca 


60 


ttaccgacat 


ttgggcgcta 


tacgtgcata 


tgttcatgta 


tgtatctgta 


tttaaaacac 


120 


ttttgtatta 


tttttcctca 


tatatgtgta 


taggtttata 


cggatgattt 


aattattact 


180 


tcaccaccct 


ttatttcagg 


ctgatatctt 


agccttgtta 


ctagttagaa 


aaagacattt 


240 


ttgctgtcag 


tcactgtcaa 


gagattcttt 


tgctggcatt 


tcttctagaa 


gcaaaaagag 


300 


cgatgcgtct 


tttccgctga 


accgttccag 


caaaaaagac 


taccaacgca 


atatggattg 


360 


tcagaatcat 


ataaaagaga 


agcaaataac 


tccttgtctt 


gtatcaattg 


cattataata 


420 


tcttcttgtt 


agtgcaatat 


catatagaag 


tcatcgaaat 


agatattaag 


aaaaacaaac 


480 


tgtacaatca 


atcaatcaat 


catcacataa 


aatgttcaaa 


gcttggcgcc 


gaattcgtcg 


540 


acccggggat 


ccgcggccgc 


aggcctaaat 


tgatctagag 


ctttggactt 


cttcgccaga 


600 


ggtttggtca 


agtctccaat 


caaggttgtc 


ggcttgtcta 


ccttgccaga 


aatttacgaa 


660 


aagatggaaa 


agggtcaaat 


cgttggtaga 


tacgttgttg 


acacttctaa 


ataagcgaat 


720 


ttcttatgat 


ttatgatttt 


tattattaaa 


taagttataa 


aaaaaataag 


tgtatacaaa 


780 


ttttaaagtg 


actcttaggt 


tttaaaacga 


aaattcttgt 


tcttgagtaa 


ctctttcctg 


840 


taggtcaggt 


tgctttctca 


ggtatagcat 


gaggtcgctc 


ttattgacca 


cacctctacc 


900 


ggcatgccca 


tgggttaact 


gatcaatgca 


tcctgcatgg 


cgcgcctgat 


gagcctgaac 


960 


tgcccgggca 


aatcagctgg 


acgtctgcct 


gcattaatga 


atcggccaac 


gcgcggggag 


1020 


aggcggtttg 


cgtattgggc 


gctcttccgc 


ttcctcgctc 


actgactcgc 


tgcgctcggt 


1080 
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cggcgagcgg 


tatcagctca 
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ctcaaaggcg 


gtaatacggt 


tatccacaga 


1140 


atcaggggat 


aacgcaggaa 


agaacatgtg 


agcaaaaggc 


cagcaaaagg 


ccaggaaccg 


1200 


taaaaaggcc 


gcgttgctgg 


cgtttttcca 


taggctccgc 


ccccctgacg 


agcatcacaa 


1260 


aaatcgacgc 


tcaagtcaga 


99tggcgaaa 


cccgacagga 


ctataaagat 


accaggcgtt 


1320 


tccccctgga 


agctccctcg 


tgcgctctcc 


tgttccgacc 


ctgccgctta 


ccggatacct 


1380 


gtccgccttt 


ctcccttcgg 


gaagcgtggc 


gctttctcat 


agctcacgct 


gtaggtatct 


1440 


cagttcggtg 


taggtcgttc 


gctccaagct 


gggctgtgtg 


cacgaacccc 


ccgttcagcc 


1500 


cgaccgctgc 


gccttatccg 


gtaactatcg 


tcttgagtcc 


aacccggtaa 


gacacgactt 


1560 


atcgccactg 


gcagcagcca 


ctggtaacag 


gattagcaga 


gcgaggtatg 


taggcggtgc 


1620 


tacagagttc 


ttgaagtggt 


ggcctaacta 


cggctacact 


agaaggacag 


tatttggtat 


1680 


ctgcgctctg 


ctgaagccag 


ttaccttcgg 


aaaaagagtt 


ggtagctctt 


gatccggcaa 


1740 


acaaaccacc 


gctggtagcg 


gtggtttttt 


tgtttgcaag 


cagcagatta 


cgcgcagaaa 


1800 


aaaaggatct 


caagaagatc 


ctttgatctt 


ttctacgggg 


tctgacgctc 


agtggaacga 


1860 


aaactcacgt 


taagggattt 


tggtcatgag 


attatcaaaa 


aggatcttca 


cctagatcct 


1920 


tttaaattaa 


aaatgaagtt 


ttaaatcaat 


ctaaagtata 


tatgagtaaa 


cttggtctga 


1980 


cagttaccaa 


tgcttaatca 


gtgaggcacc 


tatctcagcg 


atctgtctat 


ttcgttcatc 


2040 


catagttgcc 


tgactccccg 


tcgtgtagat 


aactacgata 


cgggagggct 


taccatctgg 


2100 


ccccagtgct 


gcaatgatac 


cgcgagaccc 


acgctcaccg 


gctccagatt 


tatcagcaat 


2160 


aaaccagcca 


gccggaaggg 


ccgagcgcag 


aagtggtcct 


gcaactttat 


ccgcctccat 


2220 


ccagtctatt 


aattgttgcc 


gggaagctag 


agtaagtagt 


tcgccagtta 


atagtttgcg 


2280 


caacgttgtt 


gccattgcta 


caggcatcgt 


ggtgtcacgc 


tcgtcgtttg 


gtatggcttc 


2340 


attcagctcc 


ggttcccaac 


gatcaaggcg 


agttacatga 


tcccccatgt 


tgtgcaaaaa 


2400 


agcggttagc 


tccttcggtc 


ctccgatcgt 


tgtcagaagt 


aagttggccg 


cagtgttatc 


2460 


actcatggtt 


atggcagcac 


tgcataattc 


tcttactgtc 


atgccatccg 


taagatgctt 


2520 


ttctgtgact 


ggtgagtact 


caaccaagtc 


attctgagaa 


tagtgtatgc 


ggcgaccgag 


2580 


ttgctcttgc 


ccggcgtcaa 


tacgggataa 


taccgcgcca 


catagcagaa 


ctttaaaagt 


2640 


gctcatcatt 


ggaaaacgtt 


cttcggggcg 


aaaactctca 


aggatcttac 


cgctgttgag 


2700 


atccagttcg 


atgtaaccca 


ctcgtgcacc 


caactgatct 


tcagcatctt 


ttactttcac 


2760 


cagcgtttct 


gggtgagcaa 


aaacaggaag 


gcaaaatgcc 


gcaaaaaagg 


gaataagggc 


2820 


gacacggaaa 


cgctgaacac 


tcauaccctc 


cctttttcaa 


tat cac cgaa 


gca tu to. L.ua, 


*5 Q Q n 
^ O O U 


gggttattgt 


ctcatgagcg 


gatacatatt 


tgaatgtatt 


tagaaaaata 


aacaaatagg 


2940 


ggttccgcgc 


acatttcccc 


gaaaagtgcc 


acctgacgcg 


ccctgtagcg 


gcgcattaag 


3000 
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cgcggcgggt gtggtggtta cgcgcagcgt gaccgctaca cttgccagcg ccctagcgcc 3 06 0 

cgctcctttc gctttcttcc cttcctttct cgccacgttc gccggctttc cccgtcaagc 3120 

tctaaatcgg gggctccctt tagggttccg atttagtgct ttacggcacc tcgaccccaa 3180 

aaaacttgat tagggtgatg gttcacgtag tgggccatcg ccctgataga cggtttttcg 3240 

ccctttgacg ttggagtcca cgttctttaa tagtggactc ttgttccaaa ctggaacaac 3300 

actcaaccct atctcggtct attcttttga tttataaggg attttgccga tttcggccta 3360 

ttggttaaaa aatgagctga tttaacaaaa atttaacgcg aattttaaca aaatattaac 3420 

gcttacaatt tccattcgcc attcaggctg cgcaactgtt gggaagggcg atcggtgcgg 34 80 

gcctcttcgc tattacgcca g 3501 

<210> 3 

<211> 4188 

<212> DNA 

<213> Synthetic 

<220> 

<221> misc_f eature 

<222> (2673) . . (3530) 

<223> Ampicillin resistance gene 

<220> 

<221> rep_origin 

<222> (1730) - . (2670) 

<223> ColEl 



<220> 

<221> misc_f eature 

<222> (3662) . . (4118) 

<223> fl-phage origin of replication 



<220> 

<221> terminator 

<222> (1027) . . (1355) 

<223> ADHl 



<220> 

<221> promoter 

<222> (582) . . (969) 

<223> Met25 promoter 



<220> 

<221> misc__f eature 

<222> (1365) . . (1603) 

<223> ARSl (autonomous replicating sequence) for Yeast replication 
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<220> 

<221> misc_f eature 

<222> (49) . . (574) 

<223> lambda spacer DNA (22428-22923) 

<400> 3 



ctgatttgcc 


cgggcagttc 


aggctcatca 


ggcgcgccat 


gcagggattc 


tggaaattgc 


60 


aacgaaggaa 


gaaacctcgt 


tgctggaagc 


ctggaagaag 


tatcgggtgt 


tgctgaaccg 


120 


tgttgataca 


tcaactgcac 


ctgatattga 


gtggcctgct 


gtccctgtta 


tggagtaatc 


180 


gttttgtgat 


atgccgcaga 


aacgttgtat 


gaaataacgt 


tctgcggtta 


gttagtatat 


240 


tgtaaagctg 


agtattggtt 


tatttggcga 


ttattatctt 


caggagaata 


atggaagttc 


300 


tatgactcaa 


ttgttcatag 


tgtttacatc 


accgccaatt 


gcttttaaga 


ctgaacgcat 


360 


gaaatatggt 


ttttcgtcat 


gttttgagtc 


tgctgttgat 


atttctaaag 


tcggtttttt 


420 


ttcttcgttt 


tctctaacta 


ttttccatga 


aatacatttt 


tgattattat 


ttgaatcaat 


480 


tccaattacc 


tgaagtcttt 


catctataat 


tggcattgta 


tgtattggtt 


tattggagta 


540 


gatgcttgct 


tttctgagcc 


atagctctga 


tatcagatct 


tcttcggatg 


caagggttcg 


600 


aatcccttag 


ctctcattat 


tttttgcttt 


ttctcttgag 


gtcacatgat 


cgcaaaatgg 


660 


caaatggcac 


gtgaagctgt 


cgatattggg 


gaactgtggt 


ggttggcaaa 


tgactaatta 


720 


agttagtcaa 


ggcgccatcc 


tcatgaaaac 


tgtgtaacat 


aataaccgaa 


gtgtcgaaaa 


780 


ggtggcacct 


tgtccaattg 


aacacgctcg 


atgaaaaaaa 


taagatatat 


ataaggttaa 


840 


gtaaagcgtc 


tgttagaaag 


gaagtttttc 


ctttttcttg 


ctctcttgtc 


ttttcatcta 


900 


ctatttcctt 


cgtgtaatac 


agggtcgtca 


gatacataga 


tacaattcta 


ttacccccat 


960 


ccatacaagc 


ttggcgccga 


attcgtcgac 


ccggggatcc 


gcggccgcag 


gcctaaattg 


1020 


atctagagct 


ttggacttct 


tcgccagagg 


tttggtcaag 


tctccaatca 


aggttgtcgg 


1080 


cttgtctacc 


ttgccagaaa 


tttacgaaaa 


gatggaaaag 


ggtcaaatcg 


ttggtagata 


1140 


cgttgttgac 


acttctaaat 


aagcgaattt 


cttatgattt 


atgattttta 


ttattaaata 


1200 


agttataaaa 


aaaataagtg 


tatacaaatt 


ttaaagtgac 


tcttaggttt 


taaaacgaaa 


1260 


attcttgttc 


ttgagtaact 


ctttcctgta 


ggtcaggttg 


ctttctcagg 


tatagcatga 


1320 


ggtcgctctt 


attgaccaca 


cctctaccgg 


catgcccatg 


ggttcttttg 


aaaagcaagc 


1380 


ataaaagatc 


taaacataaa 


atctgtaaaa 


taacaagatg 


taaagataat 


gctaaatcat 


1440 


ttggcttttt 


gattgattgt 


acaggaaaat 


atacatcgca 


gggggttgac 


ttttaccatt 


1500 


tcaccgcaat 


ggaatcaaac 


ttgttgaaga 


gaatgttcac 


aggcgcatac 


gctacaatga 


1560 


cccgattctt 


gctagccttt 


tctcggtctt 


gcaaacaacc 


gccaactgat 


caatgcatcc 


1620 
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ccgggcaaat 


cagctggacg 


tctgcctgca 


1680 


ttaatgaatc 


ggccaacgcg 


cggggagagg 


cggtttgcgt 


attgggcgct 


cttccgcttc 


1740 


ctcgctcact 


gactcgctgc 


gctcggtcgt 


tcggctgcgg 


cgagcggtat 


cagctcactc 


1800 


aaaggcggta 


atacggttat 


ccacagaatc 


aggggataac 


gcaggaaaga 


acatgtgagc 


1860 


aaaaggccag 


caaaaggcca 


ggaaccgtaa 


aaaggccgcg 


ttgctggcgt 


ttttccatag 


1920 


gctccgcccc 


cctgacgagc 


atcacaaaaa 


tcgacgctca 


agtcagaggt 


ggcgaaaccc 


1980 


gacaggacta 


taaagatacc 


aggcgtttcc 


ccctggaagc 


tccctcgtgc 


gctctcctgt 


2040 


tccgaccctg 


ccgcttaccg 


gatacctgtc 


cgcctttctc 


ccttcgggaa 


gcgtggcgct 


2100 


ttctcatagc 


tcacgctgta 


ggtatctcag 


ttcggtgtag 


gtcgttcgct 


ccaagctggg 


2160 


ctgtgtgcac 


gaaccccccg 


ttcagcccga 


ccgctgcgcc 


ttatccggta 


actatcgtct 


2220 


tgagtccaac 


ccggtaagac 


acgacttatc 


gccactggca 


gcagccactg 


gtaacaggat 


2280 


tagcagagcg 


aggtatgtag 


gcggtgctac 


agagttcttg 


aagtggtggc 


ctaactacgg 


2340 


ctacactaga 


aggacagtat 


ttggtatctg 


cgctctgctg 


aagccagtta 


ccttcggaaa 


2400 


aagagttggt 


agctcttgat 


ccggcaaaca 


aaccaccgct 


ggtagcggtg 


gtttttttgt 


2460 


ttgcaagcag 


cagattacgc 


gcagaaaaaa 


aggatctcaa 


gaagatcctt 


tgatcttttc 


2520 


tacggggtct 


gacgctcagt 


ggaacgaaaa 


ctcacgttaa 


gggattttgg 


tcatgagatt 


2580 


atcaaaaagg 


atcttcacct 


agatcctttt 


aaattaaaaa 


tgaagtttta 


aatcaatcta 


2640 


aagtatatat 


gagtaaactt 


ggtctgacag 


ttaccaatgc 


ttaatcagtg 


aggcacctat 


2700 


ctcagcgatc 


tgtctatttc 


gttcatccat 


agttgcctga 


ctccccgtcg 


tgtagataac 


2760 


tacgatacgg 


gagggcttac 


catctggccc 


cagtgctgca 


atgataccgc 


gagacccacg 


2820 


ctcaccggct 


ccagatttat 


cagcaataaa 


ccagccagcc 


ggaagggccg 


agcgcagaag 


2880 


tggtcctgca 


actttatccg 


cctccatcca 


gtctattaat 


tgttgccggg 


aagctagagt 


2940 


aagtagttcg 


ccagttaata 


gtttgcgcaa 


cgttgttgcc 


attgctacag 


gcatcgtggt 


3000 


gtcacgctcg 


tcgtttggta 


tggcttcatt 


cagctccggt 


tcccaacgat 


caaggcgagt 


3060 


tacatgatcc 


cccatgttgt 


gcaaaaaagc 


ggttagctcc 


ttcggtcctc 


cgatcgttgt 


3120 


cagaagtaag 


ttggccgcag 


tgttatcact 


catggttatg 


gcagcactgc 


ataattctct 


3180 


tactgtcatg 


ccatccgtaa 


gatgcttttc 


tgtgactggt 


gagtactcaa 


ccaagtcatt 


3240 


ctgagaatag 


tgtatgcggc 


gaccgagttg 


ctcttgcccg 


gcgtcaatac 


gggataatac 


3300 


cgcgccacat 


agcagaactt 


taaaagtgct 


catcattgga 


aaacgttctt 


cggggcgaaa 


3360 


actctcaagg 


atcttaccgc 


tgttgagatc 


cagttcgatg 


^aacccactc 


gtigcacccaa 


3420 


ctgatcttca 


gcatctttta 


ctttcaccag 


cgtttctggg 


tgagcaaaaa 


caggaaggca 


3480 


aaatgccgca 


aaaaagggaa 


taagggcgac 


acggaaatgt 


tgaatactca 


tactcttcct 


3540 
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ttttcaatat tattgaagca tttatcaggg ttattgtctc atgagcggat acatatttga 3600 

atgtatttag aaaaataaac aaataggggt tccgcgcaca tttccccgaa aagtgccacc 3660 

tgacgcgccc tgtagcggcg cattaagcgc ggcgggtgtg gtggttacgc gcagcgtgac 372 0 

cgctacactt gccagcgccc tagcgcccgc tcctttcgct ttcttccctt cctttctcgc 3780 

cacgttcgcc ggctttcccc gtcaagctct aaatcggggg ctccctttag ggttccgatt 3 840 

tagtgcttta cggcacctcg accccaaaaa acttgattag ggtgatggtt cacgtagtgg 3900 

gccatcgccc tgatagacgg tttttcgccc tttgacgttg gagtccacgt tctttaatag 3960 

tggactcttg ttccaaactg gaacaacact caaccctatc tcggtctatt cttttgattt 4020 

ataagggatt ttgccgattt cggcctattg gttaaaaaat gagctgattt aacaaaaatt 4080 

taacgcgaat tttaacaaaa tattaacgct tacaatttcc attcgccatt caggctgcgc 414 0 

aactgttggg aagggcgatc ggtgcgggcc tcttcgctat tacgccag 4188 

<210> 4 

<211> 11466 

<212> DNA 

<213> Synthetic 

<220> 

<221> misc_f eature 

<222> (3560) . . (4247) 

<223> Tetrahymena thermophila macronuclear telomere 
<220> 

<221> misc_f eature 

<222> (6024) . . (6711) 

<223> Tetrahymena thermophila macronuclear telomere 
<220> 

<221> misc__f eature 

<222> (9644) . . (10388) 

<22 3> Autonomous replicating sequence 
<220> 

<221> misc_feature 

<222> (10488) . . (11465) 

<223> Centromere IV 



<220> 

<221> rep_origin 

<222> (7198) . . (7198) 

<223> Origin of replication, PMBl 



<220> 

<221> mi sc___f eature 

<222> (1962) . . (2765) 

<223> URA3, orotidine-5 ' -phosphate decarboxylase coding sequence 



wo 02/059297 



10/16 



PCT/DK02/00056 



<220> 

<221> misc_f eature 

<222> (4893) . . (5552) 

<223> HISS, imidazoleglycerolphosphate dehydratase, coding sequence 
<220> 

<221> misc_f eature 

<222> (7956) . . (8816) 

<223> AP(R), beta-lactamase, ampR ampicillin resistance, coding sequenc 
e 



<220> 

<221> misc_f eature 

<222> (9129) . . (9803) 

<223> TRPl, phosphoribosylanthranilate isomerase, coding sequence 



<400> 4 
ttctcatgtt 


tgacagctta 


tcatcgataa 


gctttaatgc 


ggtagtttat 


cacagttaaa 


60 


ttgctaacgc 


agtcaggcac 


cgtgtatgaa 


atctaacaat 


gcgctcatcg 


tcatcctcgg 


120 


caccgtcacc 


ctggatgctg 


taggcatagg 


cttggttatg 


ccggtactgc 


cgggcctctt 


180 


gcgggatatc 


gtccattccg 


acagcatcgc 


cagtcactat 


ggcgtgctgc 


tagcgctata 


240 


tgcgttgatg 


caatttctat 


gcgcacccgt 


tctcggagca 


ctgtccgacc 


gctttggccg 


300 


ccgcccagtc 


ctgctcgctt 


cgctacttgg 


agccactatc 


gactacgcga 


tcatggcgac 


360 


cacacccgtc 


ctgtggatca 


attcccttta 


gtataaattt 


cactctgaac 


catcttggaa 


420 


ggaccggtaa 


ttatttcaaa 


tctctttttc 


aattgtatat 


gtgttatgtt 


atgtagtata 


480 


ctctttcttc 


aacaattaaa 


tactctcggt 


agccaagttg 


gtttaaggcg 


caagacttta 


540 


atttatcact 


acggaattgg 


cgcgccaatt 


ccgtaatctt 


gagatcgggc 


gttcgatcgc 


600 


cccgggagat 


ttttttgttt 


tttatgtctt 


ccattcactt 


cccagacttg 


caagttgaaa 


660 


tatttctttc 


aagggaattg 


atcctctacg 


ccggacgcat 


cgtggccggc 


atcaccggcg 


720 


ccacaggtgc 


ggttgctggc 


gcctatatcg 


ccgacatcac 


cgatggggaa 


gatcgggctc 


780 


gccacttcgg 


gctcatgagc 


gcttgtttcg 


gcgtgggtat 


ggtggcaggc 


cccgtggccg 


840 


ggggactgtt 


gggcgccatc 


tccttgcatg 


caccattcct 


tgcggcggcg 


gtgctcaacg 


900 


gcctcaacct 


actactgggc 


tgcttcctaa 


tgcaggagtc 


gcataaggga 


gagcgtcgac 


960 


cgatgccctt 


gagagccttc 


aacccagtca 


gctccttccg 


gtgggcgcgg 


ggcatgacta 


1020 


tcgtcgccgc 


acttatgact 


gtcttcttta 


tcatgcaact 


cgtaggacag 


gtgccggcag 


1080 


cgctctgggt 


cattttcggc 


gaggaccgct 


ttcgctggag 


cgcgacgatg 


atcggcctgt 


1140 


cgcttgcggt 


attcggaatc 


ttgcacgccc 


tcgctcaagc 


cttcgtcact 


ggtcccgcca 


1200 


ccaaacgttt 


cggcgagaag 


caggccatta 


tcgccggcat 


ggcggccgac 


gcgctgggct 


1260 
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acgtcttgct 


ggcgttcgcg 


acgcgaggct 


ggatggcctt 


ccccattatg 


attcttctcg 


1320 


cttccggcgg 


catcgggatg 


cccgcgttgc 


aggccatgct 


gtccaggcag 


gtagatgacg 


1380 


accatcaggg 


acagcttcaa 


ggatcgctcg 


cggctcttac 


cagcctaact 


tcgatcactg 


1440 


gaccgctgat 


cgtcacggcg 


atttatgccg 


cctcggcgag 


cacatggaac 


gggttggcat 


1500 


ggattgtagg 


cgccgcccta 


taccttgtct 


gcctccccgc 


gttgcgtcgc 


ggtgcatgga 


1560 


gccgggccac 


ctcgacctga 


atggaagccg 


gcggcacctc 


gctaacggat 


tcaccactcc 


1620 


aagaattgga 


gccaatcaat 


tcttgcggag 


aactgtgaat 


gcgcaaacca 


acccttggca 


1680 


gaacatatcc 


atcgcgtccg 


ccatctccag 


cagccgcacg 


cggcgcatcc 


ccccccccct 


1740 


ttcaattcaa 


ttcatcattt 


tttttttatt 


cttttttttg 


atttcggttt 


ctttgaaatt 


1800 


tttttgattc 


ggtaatctcc 


gaacagaagg 


aagaacgaag 


gaaggagcac 


agacttagat 


1860 


tggtatatat 


acgcatatgt 


agtgttgaag 


aaacatgaaa 


ttgcccagta 


ttcttaaccc 


1920 


aactgcacag 


aacaaaaacc 


tgcaggaaac 


gaagataaat 


catgtcgaaa 


gctacatata 


1980 


aggaacgtgc 


tgctactcat 


cctagtcctg 


ttgctgccaa 


gctatttaat 


atcatgcacg 


2040 


aaaagcaaac 


aaacttgtgt 


gcttcattgg 


atgttcgtac 


caccaaggaa 


ttactggagt 


2100 


tagttgaagc 


attaggtccc 


aaaatttgtt 


tactaaaaac 


acatgtggat 


atcttgactg 


2160 


atttttccat 


ggagggcaca . 


gttaagccgc 


taaaggcatt 


atccgccaag 


tacaattttt 


2220 


tactcttcga 


agacagaaaa 


tttgctgaca 


ttggtaatac 


agtcaaattg 


cagtactctg 


2280 


cgggtgtata 


cagaatagca 


gaatgggcag 


acattacgaa 


tgcacacggt 


gtggtgggcc 


2340 


caggtattgt 


tagcggtttg 


aagcaggcgg 


cagaagaagt 


aacaaaggaa 


cctagaggcc 


2400 


ttttgatgtt 


agcagaattg 


tcatgcaagg 


gctccctatc 


tactggagaa 


tatactaagg 


2460 


gtactgttga 


cattgcgaag 


agcgacaaag 


attttgttat 


cggctttatt 


gctcaaagag 


2520 


acatgggtgg 


aagagatgaa 


ggttacgatt 


ggttgattat 


gacacccggt 


gtgggtttag 


2580 


atgacaaggg 


agacgcattg 


ggtcaacagt 


atagaaccgt 


ggatgatgtg 


gtctctacag 


2640 


gatctgacat 


tattattgtt 


ggaagaggac 


tatttgcaaa 


gggaagggat 


gctaaggtag 


2700 


agggtgaacg 


ttacagaaaa 


gcaggctggg 


aagcatattt 


gagaagatgc 


ggccagcaaa 


2760 


actaaaaaac 


tgtattataa 


gtaaatgcat 


gtatactaaa 


ctcacaaatt 


agagcttcaa 


2820 


tttaattata 


tcagttatta 


ctcgggcgta 


atgattttta 


taatgacgaa 


aaaaaaaaaa 


2880 


ttggaaagaa 


aagggggggg 


gggcagcgtt 


gggtcctggc 


cacgggtgcg 


catgatcgtg 


2940 


ctcctgtcgt 


tgaggacccg 


gctaggctgg 


cggggttgcc 


ttactggtta 


gcagaatgaa 


3000 


tcaccgatac 


gcgagcgaac 


gtgaagcgac 


tgctgctgca 


aaacgtctgc 


gacctgagca 


3060 


acaacatgaa 


tggtcttcgg 


tttccgtgtt 


tcgtaaagtc 


tggaaacgcg 


gaagtcagcg 


3120 
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ccctgcacca ttatgttccg 

cctacatctg tattaacgaa 

cgcatccata ccgccagttg 

tcagtaaccc gtatcgtgag 

agaaattccc ccttacacgg 

tggcccgctt tatcagaagc 

cggatgaaca ggcagacatc 

gccctcgagg gataagcttc 

aaaacatttt atttattgat 

tgaaaaactt ataaaaattt 

ttgataagaa cttcaatctt 

tttatgttta ttcatatata 

ttaatcattc ataaataact 

tgtcaaagaa ttattggggt 

gttggggttg gggttggggt 

gttggggttg gggttggggt 

gttggggttg gggttggggt 

gttggggttg gggttggggt 

ggtattagaa gaatatcctg 

acaccaaata tggcgatctc 

atccatctac caccagaacg 

ccaccgttgc cgtaaccacc 

ccgttgtagc cgccgttgtt 

tttcctctta agtgaggagg 

tgcacttgtt cgctcagttc 

taccacttgc cacctatcac 

ttttttctcg atcgagttca 

gcgcctcgtt cagaatgaca 

actgttcgta tacatactta 

tcgtatgctg cagctttaaa 

acatcgttgg taccattggg 

gcactctcac tacggtgatg 
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gatctgcatc gcaggatgct 



gcgctggcat 


tgaccctgag 


tttaccctca 


caacgttcca 


catcctctct 


cgtttcatcg 


aggcatcaag 


tgaccaaaca 


cagacattaa 


cgcttctgga 


tgtgaatcgc 


ttcacgacca 


atttttagat 


aaaatttatt 


cttttataac 


aaaaaaccct 


atgaaaacta 


caaaaaataa 


tgactagcta 


gcttagtcat 


aactattcaa 


aatattatag 


aaaaatcaaa 


gtattacatc 


tggggttggg 


gttggggttg 


tggggttggg 


gttggggttg 


tggggttggg 


gttggggttg 


tggggttggg 


gttggggttg 


tggggttggg 


gttggggttg 


attcaggtga 


aaatattgtt 


ggccttttcg 


tttcttggag 


gccgttagat 


ctgctgccac 


acgacggttg 


ttgctaaaga 


gttattgtag 


ttgctcatgt 


aacataacca 


ttctcgttgt 


agccataata 


tgaaatgctt 


cacaactaac 


tttttcccgt 


agagaaaaaa 


aaagaaaaag 


cgtatagaat 


gatgcattac 


ctgacattca 


taggtataca 


taatcggtgt 


cactacataa 


cgaggtggct 


tctcttatgg 


atcattcttg 


cctcgcagac 
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gctggctacc 


ctgtggaaca 


3180 


tgatttttct 


ctggtcccgc 


3240 


gtaaccgggc 


atgttcatca 


3300 


gtatcattac 


ccccatgaac 


3360 


ggaaaaaacc 


gcccttaaca 


3420 


gaaactcaac 


gagctggacg 


3480 


cgctgatgag 


ctttaccgca 


3540 


aatcatcatt 


aatttcttga 


3600 


tctaaaagtt 


tatttttgaa 


3660 


aatttttaat 


taaaataatt 


3720 


ttttgagatt 


taattaatat 


3780 


aatttaaaca 


ttttaacatc 


3840 


aataaataac 


ttttactcaa 


3900 


gggttggggt 


tggggttggg 


3960 


gggttggggt 


tggggttggg 


4020 


gggttggggt 


tggggttggg 


4080 


gggttggggt 


tggggttggg 


4140 


gggtgggaaa 


acagcattca 


4200 


gatgcgcggg 


atcctcgggg 


4260 


ctgggacatg 


tttgccatcg 


4320 


cgttgtttcc 


accgaagaaa 


4380 


agctgccacc 


gccacggcca 


4440 


tatttctggc 


acttcttggt 


4500 


tgtcgttgat 


gcttaaattt 


4560 


ttcttgttgt 


tcttacggaa 


4620 


tcctccatct 


cttttatatt 


4680 


caaaaagaaa 


aaaggaaagc 


4740 


cttgtcatct 


tcagtatcat 


4800 


tatatacaca 


tgtatatata 


4860 


gaacaccttt 


ggtggaggga 


4920 


caaccgcaag 


agccttgaac 


4980 


aatcaacgtg 


gagggtaatt 


5040 
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ctgctagcct ctgcaaagct ttcaagaaaa 
actttctccc tttgcaaacc aagttcgaca 
accgctctgg aaagtgcctc atccaaaggc 
cgcgccagta gggcctcttt aaaagcttga 
tgatggtcgt ctatgtgtaa gtcaccaatg 
ttggccagag catgtatcat atggtccaga 
tgcgattgtg tggcctgttc tgctactgct 
tctatcgcta ggggaccacc ctttaaagag 
atacgcttta ctagggcttt ctgctctgtc 
ttttttagta tattcttcga agaaatcaca 
gataatgcca atcgctaaga aaaaaaaaga 
aaatcattac cgaggcataa aaaaatatag 
aaaagaaaat tgcgggaaag gactgtgtta 
gatacctggc agtgactcct agcgctcacc 
cgtcatcttt cgataagttt ttcccacagc 
cgttgaatga agacaaagcg tcgtggttta 
aacaggaccg tgcagcggat cccgcgcatc 
ttctaatacc tgaatgctgt tttcccaccc 
caaccccaac cccaacccca accccaaccc 
caaccccaac cccaacccca accccaaccc 
caaccccaac cccaacccca accccaaccc 
caaccccaac cccaacccca accccaaccc 
ttctttgaca ttgagtaaaa gttatttatt 
gaatgattaa gatgttaaaa tgtttaaatt 
taaacataaa atattaatta aatctcaaaa 
ttcttatcaa aattatttta attaaaaatt 
aagtttttca ttcaaaaata aacttttaga 
aaaatgtttt tcaagaaatt aatgatgatt 
cctcgagggc tgcctcgcgc gtttcggtga 
cccggagacg gtcacagctt gtctgtaagc 
cgcgtcagcg ggtgttggcg ggtgtcgggg 
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tgcgggatca 


tctcgcaaga 


gagatctcct 


5100 


actgcgtacg 


gcctgttcga 


aagatctacc 


5160 


gcaaatcctg 


atccaaacct 


ttttactcca 


5220 


ccgagagcaa 


tcccgcagtc 


ttcagtggtg 


5280 


cactcaacga 


ttagcgacca 


gccggaatgc 


5340 


aaccctatac 


ctgtgtggac 


gttaatcact 


5400 


tctgcctctt 


tttctgggaa 


gatcgagtgc 


5460 


atcgcaatct 


gaatcttggt 


ttcatttgta 


5520 


atctttgcct 


tcgtttatct 


tgcctgctca 


5580 


ttactttata 


taatgtataa 


ttcattatgt 


5640 


gtcatccgct 


aggtggaaaa 


aaaaaaatga 


5700 


agtgtactag 


aggaggccaa 


gagtaataga 


5760 


tgacttccct 


gactaatgcc 


gtgttcaaac 


5820 


aagctcttaa 


aacgagaatt 


aagaaaaagt 


5880 


aaagcaatag 


tagaaaaaaa 


caatgggaaa 


5940 


aaaggaaata 


cgctcacgta 


catgctaggg 


6000 


aacaatattt 


tcacctgaat 


caggatattc 


6060 


caaccccaac 


cccaacccca 


accccaaccc 


6120 


caaccccaac 


cccaacccca 


accccaaccc 


6180 


caaccccaac 


cccaacccca 


accccaaccc 


6240 


caaccccaac 


cccaacccca 


accccaaccc 


6300 


caaccccaac 


cccaacccca 


accccaataa 


6360 


gatgtaatac 


tttgattttt 


agttatttat 


6420 


ctataatatt 


ttgaatagtt 


tatatatgaa 


6480 


atgactaagc 


tagctagtca 


aagattgaag 


6540 


ttattttttg 


tagttttcat 


aaatttttat 


6600 


agggtttttt 


gttataaaag 


atcaataaat 


6660 


aataaatttt 


atctaaaaat 


gaagcttatc 


6720 


tgacggtgaa 


aacctctgac 


acatgcagct 


6780 


ggatgccggg 


agcagacaag 


cccgtcaggg 


6840 


cgcagccatg 


acccagtcac 


gtagcgatag 


6900 
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actggcttaa 


ctatgcggca 


tcagagcaga 


atgcggtgtg 


aaataccgca 


cagatgcgta 


aggagaaaat 


gcttcctcgc 


tcactgactc 


gctgcgctcg 


gtcgttcggc 


cactcaaagg 


cggtaatacg 


gttatccaca 


gaatcagggg 


tgagcaaaag 


gccagcaaaa 


ggccaggaac 


cgtaaaaagg 


cataggctcc 


gcccccctga 


cgagcatcac 


aaaaatcgac 


aacccgacag 


gactataaag 


ataccaggcg 


tttccccctg 


cctgttccga 


ccctgccgct 


taccggatac 


ctgtccgcct 


gcgctttctc 


atagctcacg 


ctgtaggtat 


ctcagttcgg 


ctgggctgtg 


tgcacgaacc 


ccccgttcag 


cccgaccgct 


cgtcttgagt 


ccaacccggt 


aagacacgac 


ttatcgccac 


aggattagca 


gagcgaggta 


tgtaggcggt 


gctacagagt 


tacggctaca 


ctagaaggac 


agtatttggt 


atctgcgctc 


ggaaaaagag 


ttggtagctc 


ttgatccggc 


aaacaaacca 


tttgtttgca 


agcagcagat 


tacgcgcaga 


aaaaaaggat 


ttttctacgg 


ggtctgacgc 


tcagtggaac 


gaaaactcac 


agattatcaa 


aaaggatctt 


cacctagatc 


cttttaaatt 


atctaaagta 


tatatgagta 


aacttggtct 


gacagttacc 


cctatctcag 


cgatctgtct 


atttcgttca 


tccatagttg 


ataactacga 


tacgggaggg 


cttaccatct 


ggccccagtg 


ccacgctcac 


cggctccaga 


tttatcagca 


ataaaccagc 


agaagtggtc 


ctgcaacttt 


atccgcctcc 


atccagtcta 


agagtaagta 


gttcgccagt 


taatagtttg 


cgcaacgttg 


gtggtgtcac 


gctcgtcgtt 


tggtatggct 


tcattcagct 


cgagttacat 


gatcccccat 


gttgtgcaaa 


aaagcggtta 


gttgtcagaa 


gtaagttggc 


cgcagtgtta 


tcactcatgg 


tctcttactg 


tcatgccatc 


cgtaagatgc 


ttttctgtga 


tcattctgag 


aatagtgtat 


gcggcgaccg 


agttgctctt 


aataccgcgc 


cacatagcag 


aactttaaaa 


gtgctcatca 


cgaaaactct 


caaggatctt 


accgctgttg 


agatccagtt 


cccaactgat 


cttcagcatc 


ttttactttc 


accagcgttt 


aggcaaaatg 


ccgcaaaaaa 


gggaataagg 


gcgacacgga 
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ttgtactgag 


agtgcaccat 


6960 


accgcatcag 


gcgctcttcc 


7020 


tgcggcgagc 


ggtatcagct 


7080 


ataacgcagg 


aaagaacatg 


7140 


ccgcgttgct 


ggcgtttttc 


7200 


gctcaagtca 


gaggtggcga 


7260 


gaagctccct 


cgtgcgctct 


7320 


ttctcccttc 


gggaagcgtg 


7380 


tgtaggtcgt 


tcgctccaag 


7440 


gcgccttatc 


cggtaactat 


7500 


tggcagcagc 


cactggtaac 


7560 


tcttgaagtg 


gtggcctaac 


7620 


tgctgaagcc 


agttaccttc 


7680 


ccgctggtag 


cggtggtttt 


7740 


ctcaagaaga 


tcctttgatc 


7800 


gttaagggat 


tttggtcatg 


7860 


aaaaatgaag 


ttttaaatca 


7920 


aatgcttaat 


cagtgaggca 


7980 


cctgactccc 


cgtcgtgtag 


8040 


ctgcaatgat 


accgcgagac 


8100 


cagccggaag 


ggccgagcgc 


8160 


ttaattgttg 


ccgggaagct 


8220 


ttgccattgc 


tgcaggcatc 


8280 


ccggttccca 


acgatcaagg 


8340 


gctccttcgg 


tcctccgatc 


8400 


ttatggcagc 


actgcataat 


8460 


ctggtgagta 


ctcaaccaag 


8520 


gcccggcgtc 


aacacgggat 


8580 


ttggaaaacg 


ttcttcgggg 


8640 


cgatgtaacc 


cactcgtgca 


8700 


ctgggtgagc 


aaaaacagga 


8760 


aatgttgaat 


actcatactc 


8820 
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ttcctttttc 


aatattattg 


aagcatttat 


cagggttatt 


gtctcatgag 


cggatacata 


8880 


tttgaatgta 


tttagaaaaa 


taaacaaata 


ggggttccgc 


gcacatttcc 


ccgaaaagtg 


8940 


ccacctgacg 


tctaagaaac 


cattattatc 


atgacattaa 


cctataaaaa 


taggcgtatc 


9000 


acgaggccct 


ttcgtcttca 


agaattaatt 


cggtcgaaaa 


aagaaaagga 


gagggccaag 


9060 


agggagggca 


ttggtgacta 


ttgagcacgt 


gagtatacgt 


gattaagcac 


acaaaggcag 


9120 


cttggagtat 


gtctgttatt 


aatttcacag 


gtagttctgg 


tccattggtg 


aaagtttgcg 


9180 


gcttgcagag 


cacagaggcc 


gcagaatgtg 


ctctagattc 


cgatgctgac 


ttgctgggta 


9240 


ttatatgtgt 


gcccaataga 


aagagaacaa 


ttgacccggt 


tattgcaagg 


aaaatttcaa 


9300 


gtcttgtaaa 


agcatataaa 


aatagttcag 


gcactccgaa 


atacttggtt 


ggcgtgtttc 


9360 


gtaatcaacc 


taaggaggat 


gttttggctc 


tggtcaatga 


ttacggcatt 


gatatcgtcc 


9420 


aactgcatgg 


agatgagtcg 


tggcaagaat 


accaagagtt 


cctcggtttg 


ccagttatta 


9480 


aaagactcgt 


atttccaaaa 


gactgcaaca 


tactactcag 


tgcagcttca 


cagaaacctc 


9540 


attcgtttat 


tcccttgttt 


gattcagaag 


caggtgggac 


aggtgaactt 


ttggattgga 


9600 


actcgatttc 


tgactgggtt 


ggaaggcaag 


agagccccga 


aagcttacat 


tttatgttag 


9660 


ctggtggact 


gacgccagaa 


aatgttggtg 


atgcgcttag 


attaaatggc 


gttattggtg 


9720 


ttgatgtaag 


cggaggtgtg 


gagacaaatg 


gtgtaaaaga 


ctctaacaaa 


atagcaaatt 


9780 


tcgtcaaaaa 


tgctaagaaa 


taggttatta 


ctgagtagta 


tttatttaag 


tattgtttgt 


9840 


gcacttgcct 


gcaggccttt 


tgaaaagcaa 


gcataaaaga 


tctaaacata 


aaatctgtaa 


9900 


aataacaaga 


tgtaaagata 


atgctaaatc 


atttggcttt 


ttgattgatt 


gtacaggaaa 


9960 


atatacatcg 


cagggggttg 


acttttacca 


tttcaccgca 


atggaatcaa 


acttgttgaa 


10020 


gagaatgttc 


acaggcgcat 


acgctacaat 


gacccgattc 


ttgctagcct 


tttctcggtc 


10080 


ttgcaaacaa 


ccgccggcag 


cttagtatat 


aaatacacat 


gtacatacct 


ctctccgtat 


10140 


cctcgtaatc 


attttcttgt 


atttatcgtc 


ttttcgctgt 


aaaaacttta 


tcacacttat 


10200 


ctcaaataca 


cttattaacc 


gcttttacta 


ttatcttcta 


cgctgacagt 


aatatcaaac 


10260 


=i9tgacacat 


attaaacaca 


gtggtttctt 


tgcataaaca 


ccatcagcct 


caagtcgtca 


10320 


agtaaagatt 


tcgtgttcat 


gcagatagat 


aacaatctat 


atgttgataa 


ttagcgttgc 


10380 


ctcatcaatg 


cgagatccgt 


ttaaccggac 


cctagtgcac 


ttaccccacg 


ttcggtccac 


10440 


tQtQtcrccQa 


acatgctcct 


tcactatttt 


aacatgtgga 


attaattcta 


aatcctcttt 


10500 


atatgatctg 


ccgatagata 


gttctaagtc 


attgaggttc 


atcaacaatt 


ggattttctg 


10560 


tttactcgac 


ttcaggtaaa 


tgaaatgaga 


tgatacttgc 


ttatctcata 


gttaactcta 


10620 


agaggtgata 


cttatttact 


gtaaaactgt 


gacgataaaa 


ccggaaggaa 


gaataagaaa 


10680 
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actcgaactg 


atctataatg 


cctattttct 


gtaaagagtt 


taagctatga 


aagcctcggc 


10740 


attttggccg 


ctcctaggta 


gtgctttttt 


tccaaggaca 


aaacagtttc 


tttttcttga 


10800 


gcaggtttta 


tgtttcggta 


atcataaaca 


ataaataaat 


tatttcattt 


atgtttaaaa 


10860 


ataaaaaata 


aaaaagtatt 


ttaaattttt 


aaaaaagttg 


attataagca 


tgtgaccttt 


10920 


tgcaagcaat 


taaattttgc 


aatttgtgat 


tttaggcaaa 


agttacaatt 


tctggctcgt 


10980 


gtaatatatg 


tatgctaaag 


tgaactttta 


caaagtcgat 


atggacttag 


tcaaaagaaa 


11040 


ttttcttaaa 


aatatatagc 


actagccaat 


ttagcacttc 


tttatgagat 


atattataga 


11100 


ctttattaag 


ccagatttgt 


gtattatatg 


tatttacccg 


gcgaatcatg 


gacatacatt 


11160 


ctgaaatagg 


taatattctc 


tatggtgaga 


cagcatagat 


aacctaggat 


acaagttaaa 


11220 


agctagtact 


gttttgcagt 


aatttttttc 


ttttttataa 


gaatgttacc 


acctaaataa 


11280 


gttataaagt 


caatagttaa 


gtttgatatt 


tgattgtaaa 


ataccgtaat 


atatttgcat 


11340 


gatcaaaagg 


ctcaatgttg 


actagccagc 


atgtcaacca 


ctatattgat 


caccgatata 


11400 


tggacttcca 


caccaactag 


taatatgaca 


ataaattcaa 


gatattcttc 


atgagaatgg 


11460 


cccaga 
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